Transcripts
1. SQL | Course Introduction: Hi and welcome to this
very unique SQL course. I embarrass or kidney IT
solution architects with over a decade of
experience in IT projects. I will put everything
that I know about SQL into 4 h tutorial. In this course, you will
learn everything that you need about one of the
most in-demand skill, the SQL, from basic
to advanced topics. So by the end of the course, you will be able to write
SQL queries very easily. We can work with the one of the most popular version of SQL, MySQL, by the syntaxes and the skills that you're going
to learn from this course. It can be used in any other databases or
applications using SQL, I designed this course to
take you from zero to hero. So if you are a beginner, don't worry about it. I'm going to explain everything from the scratch step-by-step. So now if you ask me what makes your course a very special
compared to the other courses. In this course,
you will not only learn how to write SQL queries, but also you will learn the
SQL concepts behind them, and especially how the SQL processes the queries
behind the scene. And this can help
you to understand why we write SQL queries. And it's going to make you more creative with your
query statements. In this course, you will have
tons of crisp practices and tips and tricks that I
collected in the last years. And we will have
many SQL tasks and then we are going to solve
them together step-by-step. And I will be providing you
with a lot of free materials. All the content of
this course is also available on my website
data with borrow.com. You can use it later
as a reference. I will provide you
as well with SQL, she achieved where you can
find all the tasks and the SQL syntaxes so you don't have to
memorize all of them. I've also prepared for
this course a database. Where are we going to use it in all our tasks and
examples during the tutorials and as well all the SQL representations and concepts made in this course. So now let's jump
in and get started.
2. SQL | Course Curriculum Overview: All right everyone, So now
I would like to show you the roadmap of the entire
SQL course for beginners, that SQL course is divided
into nine chapters. First, we're going to start
with the basics where you can learn the basic
concepts about SQL, like the concept of the
databases, SQL tables concept, the basic SQL commands, and the main elements
of the SQL statements. In the next chapter, we're going to start preparing your environment so you
can practice with me. I will walk you
through the steps of downloading and
installing MySQL. Then we will take
it quick tour of the interface and add the end. We're going to install the
database of our course. And then finally, you
will begin to use SQL syntax to query
the database and the tables that you're
just created in the previous section using
the select statements. After that, you will learn
how to filter your data using the where clause and
learn some SQL operators. In the next chapter we're
going to step up the level. Where are we going to
learn how to combine our SQL tables using
joins and Union? After that, we're going to learn many important SQL
functions like aggregations and
string functions. Then in the next
chapter we're going to raise the level
again by learning advanced topics in SQL like
group by having a subqueries. Then we're going to learn how
to modify our data inside our tables using insert,
update, and delete. And in the last chapter
of this course, we will learn how to define our data using SQL like create, alter, and drop tables. So those are all the
topics that we're going to cover in this course. Alright everyone. So with this, I could say, let's jump in and
start our SQL course. Alright, so we're going to start with the first chapter. Here. We're going to talk about
the SQL basics and concepts. And we're going to start now
with an introduction to SQL.
3. SQL | Introduction: Alright, so we will start
with the SQL basics, the terms that you'll be
hearing during the tutorials, e.g. what is data? Data are facts or
statistics that are stored somewhere or
moving around the network. Generally, they are like
raw materials, e.g. if you order some things online, a lot of data will be generated. E.g. the customer ID, the order number, order date, shipping dates, and so on. Another term that we
have is informations. So the data that we have, we could reprocess
its structure it, or translate it to a new
form called informations, which it has more
logical meaning. And we could use it
in the analysis, e.g. if we aggregate the order
dates over the years, we could see how the company
is growing over the years. That means we converted the raw data into
meaningful information. Alright, so what our
database is a shortcut dB. By definition, a
database is a collection of structured and
related data that are stored or organized in a way that the data are easily to
be accessed and managed. Shortcuts, it is one
way to store your data. You will deal with databases
everyday and everywhere. So e.g. if you order
something online, even if you store your photo
at your smartphone gallery. This gallery is a database. We have around many
different databases. The most famous one is the
one that we're going to learn is the relational SQL databases. Other one is NoSQL database. We have distributed databases, cloud databases, data
warehouses, and so on. So now I'm going
to go and explain SQL and NoSQL databases because they are the
most famous ones. Sql or relational databases. They store the data
inside tables. Tables are like containers
with a fixed structure, and usually they are related to each other using relationships. That's why we have the name
of relational databases. So if you're that are very structured and easy
to understand, It would be good if you use SQL databases to
store your data. In the other hand, we
have no SQL databases, or not only SQL databases. And here you have different
types of options. How are you going
to store your data? E.g. you have the
key value methods where you're going
to define the keys and the value inside them. You have the graph store, you have the column store, which is great for big data. Some tools like Tableau
for data visualization, they use this method
to store the data because it gives great
performance and analyses. And as well you
have the document. So if you are in projects
where the requirements are changing a lot or the data
are hard to understand. They don't have like clear
structures and so on. It really be good if you use the NoSQL databases
to store your data, to use one of those methods. But in many companies, a lot of projects are storing the data
inside SQL Database is because they are easy to
understand and very widely used. And in our tutorials we will
be focusing on these types of databases, SQL
relational databases. Now in order to manage
all those databases, reuse the software called database management
system or DBMS. It is like an application
with an interface where you can login and start doing
something inside your database. You can do stuff like creating new tables or
changing your data, querying your data and so on. And currently we have almost 380 different DBMSs according to the survey of
Stack Overflow for this year. I'm going to leave the
link in the description. You can see here a
ranking of the top and most used database
is between developers. So you can see here my
scale is number one, then both Chris and so on. We have another
ranking websites. It's called DB engine's ranking. If we go there, you find
the list or rank of the top abused or most popular
DBMS and the ward, they are using
different criteria in order to calculate that. But you can see here my SQL is in the top three in the list. In our tutorials, we will be using MySQL and
we will learn it, which is the most famous and commonly used
databases these days. Now finally, what is SQL? It stands for Structured
Query Language. So by definition, SQL is the query language that we
use in order to retrieve, manage, manipulate,
store data in databases. In short, SQL is the
language that you need to master in order
to talk to databases. So now in the Internet, there is a never ending battle
in how to pronounce it. Some developers call it sequels and other
colleagues like me, SQL. It's really depend on
the country that you come from or the project
that you are working. In my project
everyone call it SQL. So it's really up to you which one that
you're going to use. Alright, you might ask me now, borrow how SQL works. Let's check this. On the right side we have our relational database where you store your data
inside tables. And here we have our DBMS
managing our database. So the first thing that you're
going to do is to login to the DBMS in order to
interact with it. Or if you are building
an applications, you need to connect
them to the DBMS. After that, you start
writing some SQL statements, some instructions, and then
hit the button, execute. After that, the DBMS
will start processing and do some magic to it and
send it to the database. Once the database gets such a
query, it start performing. Some operations, are searching for the data that you asked for. Once it's ready, the
database will answer to the DBMS with the result
that you want it. Alright guys, so that's why it's a quick introduction to SQL. Next, we're going to
start talking about why SQL is important and why
you should learn it.
4. SQL | Why Learn SQL?: Now I just wanted
to quickly motivate why you should still learn SQL. Here are some facts. Sql is, SQL is 47 years old, that is 14 years older than
me. You can do the math. So SQL is the granddaddy
of the programming world. There are over 700
computer languages that you could learn. You might as well here
about the NoSQL movement, where everybody say that NoSQL going to kill
the SQL databases. So you might ask now, why we still use SQL? Why should I learn is
qu'il, y is scaled. It didn't die like many
other languages did, like basic or Pascal. Well, the quick answer for
that is SQL still works. It does the jobs and you
cannot ask more than that. Here by four reasons why
you should still learn SQL. Reason number one is scale is the most used technology in
the entire tech industry. If we check now here's a survey of stack overflow this year. I will leave the link
in the description. In this chart, we can see
the most used technologies. And you can see here, SQL is ranked as the force commonly used technology
among all developers. That's means SQL still in trend. Reason number two is
SQL in high demand. Most of the companies
in all industries, they use some kind of SQL
databases to store their data. That means they always
going to need someone with SQL skills in order to create, manage, analyze, and
understand their data. So now let's do a quick
check in Java platform like Indeed search for
the keyword SQL. Sql, find jobs. Let's see the results. So you can see here over 170,000 jobs are looking for SQL developer or someone
with SQL skills. That means it's scaled skills
are really in high demand. And that's because data analyses becoming very important
part in many jobs. The third reason is SQL
is almost everywhere. If you are in projects and you are working with data, e.g. data mining, data engineering, data science, or
data visualizations. You will be end up using
a lot of big data tools. I'm programming languages. And most of them they
tend to offer you places to write some kind of
SQL statements, e.g. if you are using Tableau, it is very famous data
visualization tool. There is places where
you need to write some SQL statement in
order to prepare the data. Or if you are in projects
where you are doing like data streaming
using Kafka, e.g. there you will find a
lot of functions are models where you have to
write some SQL statements. They do that to make stuff
easier. So that's means. With the time you will see
that almost in each tool you can use SQL statements
and SQL skills. So now for the last reason, unlike other languages,
SQL is simple and easy. It is easy to learn, easy to write, easy to read, because the SQL
syntaxes are based in very common easy
English words, e.g. select from Curia tables
where, and so on. And SQL Managed bear frankly to hide all the complicated
processes from you. So that's why a lot of people tends to learn SQL
because it's really easy. Alright, so now let's sum up. Sql has the best combinations. Sql is very high in
demand and as well, it is easy to learn, which makes learning
it's grill is always a smart move and one of the impactful career improvement any IT developer can unlock. Alright, so that was my top reasons why
you should learn SQL. Next, we're going to talk
about the database concepts.
5. SQL | The Database Concepts: Alright, so now
let's understand how SQL databases are organized. It's very important to
understand that because once you start writing SQL
statements or SQL queries, it's very important to
understand the terms that are commonly
used in databases, or how to browse your database, or how to find your data. If you let it out at the start, it's going to make the
learning process of writing SQL statements
much faster. Okay, So now just to make
it easier to understand, think of the following analogy. A database is like
your city library. We have in Stuttgart,
very beautiful library. It's really amazing. I spent a lot of time there. I just like it. So yeah, database
is like libraries. Libraries are
divided usually into categories like science fiction, romance, history,
sport, and so on. So category is going
to help you to find quickly the materials that
you are searching for. So categories are
like grouping up those similar books
underneath the same category, we have the same concept as well in databases and we call it schemas or Shamata, pick
the one that you like. And of course in libraries
we have as well books. We have the similar stuff in databases and we call it tables, where it contains
the actual data. So as you saw in the examples, databases are
organized in here RC, let's see my SQL, how they organize the data, because not all databases are following the same concepts
on how to organize the data. So at the start was my screen. We have the database server. It's like machine containing
software and hardware in order to run our
DBMSs and databases, usually database
server, it's like high-end computer with
a lot of CPUs and rams. But in our tutorials, we will install a
database server at our local computer or laptop, and we call it local server. Inside the server, you can create them multiple databases. In my SQL databases and
schemas, they are synonyms. So a schema by definition, it is like logical containers that's contains similar tables. With that you get
a lot of benefits. E.g. imagine you have a big database with
a lot of tables, grouping up those similar
tables underneath schemas. It's going to make it easier for you to manage the user, e.g. or to manage the tables. Reduce complexity. And as well, if you have like two tables with the same names, you could store them
in different schemas. So it's really nice way how to organize the database
inside the schema. Then we will have
different tables. Tables are the most
important object on the whole database because it is the place where you
can store your data. Without tables, we
have no database. And inside the
tables you will have at least than one column
or different columns. I will go in details explaining those tables as a next step. Okay, so now I just
wanted to show you quickly how other databases, like Microsoft SQL Server
or boot scripts SQL, how they organize the
data compared to MySQL. So as you can see here, the key difference is that they split database is from schemas. So a database here, it is like the main container, a discrete unit on its own, where you can have logs, jobs, schema's data, and
you can do backups, schemas over here it is like a folder inside
the database. It's like logical layer
containing different tables. In my opinion, MySQL is little bits like misleading
or confusing developers. E.g. if you go and
create schema, the DBMS of MySQL will
be creating a database. I find that at the starts
little bit confusing. Alright, so that was it
about the database concepts. Next we're going to
start talking about the SQL table concepts.
6. SQL | Table Concepts: Alright, so now let's talk
about SQL tables because they are really important in the databases and
understanding them, it's going to help you to
write better SQL statements. The problem is that we have around 380 different databases and they use different terms
in the recommendations. Another aspect is that we use different terms in
different area forks, e.g. if you are a database developer, you will start using terms like tables, columns, and rows. But if you aren't
in the university, you will hear about
relations, tuples. And as data
modularity will start seeing entity and attributes. That's why I would
like to give you a short overview of those
terms to make it simpler. Alright, so now we have here a very simple example
of SQL tables. In our tutorial database, we have one table
called costumers. This table contains all the
data about our customers. Another name that we
have four tables is objects, entity, and relation. Okay, Next we have columns. Columns are the vertical
group of cells that are describing one
type of information. In our example, we
have four columns. Customer ID, FirstName,
LastName, and country. Each column has
two informations. The column name, e.g. here we have the firstName
and the values inside it, like Maria drawn and so on. Alright, so next we have rows. Rows are the horizontal group of cells that are describing one individual topic and they are related to
each other as well. So e.g. here we have the
customer id2 belongs to John, and John lives in the US. In this table we
have five roads. Another names for rows
are records and tuples. Now at the intersections
between columns and rows, this piece of data
we call it the cell. Another names we
have data items, column value, it is
one single value. Another example is number four, or Germany or George and so on. The last component we
have is the primary key. The primary key is a column
or set of columns that can uniquely identify
each row in the table, and they could be used as a
link within other tables. In our example, we have the customer ID and this
is our primary key. You can see it has unique
value for each customer. Another name for it
called key fields. Alright, That's what
the main component of the SQL tables. Alright, so that was
the concepts and the main components
of SQL tables. And next we're going
to start talking about the different types
of SQL commands.
7. SQL | Main SQL Commands: Alright, so now let's
talk about SQL commands. In SQL we have around 12 main commands and
900 different keywords. Of course, I will not be
explaining all of them. Instead of that,
in our tutorials, I will be focusing on the most used SQL
commands and statements that I use in my projects in the last ten years to
make our life easier, SQL commands are divided into different groups depending
on their purposes. Alright, let's start with the first group data
definition language, DDL. As the name suggests
here you will find all the commands that allow
you define your database, like creating tables, dropping
columns, changing tables, anything that's going to change the structure of your database. Underneath this group, you can
find commands like create, which helps you to create
anything new in the database, like create a new table, create a new view stored
procedures, and so on. One more we have here
the drop commands that allow you to delete one
object from your database. And the last one alter. It helps you to edit the
structure of your database, like altering one table to change a column or
to add new column. Okay, So now to
the second group, we have Data Query Language DQL. It contains only one
commands, and that's enough. It's called the select command. Selects helps you to retrieve your data
from your database. The left is the most
important command that we have in SQL, and the one that you need to master in order to
be good in SQL. In my tutorials, I will be explaining everything
about the SQL select statements because if
you start working with SQL, you will end up writing
tons of select statements. Don't worry about it. Alright, so let's go
now to the next group. We have data manipulation
language, DML. Dml contains all the SQL
commands that you could use in order to manipulate your data
inside your database. So we have commands like insert, in order to insert new
data inside your tables. Or we have delete to delete
some data from your tables or updates to update the content of existing
data inside tables. So as you see, it
is really easy. The name stills everything. Alright, so now we have two
groups of command that is really more for SQL
database administrators. The next one we have data
control language, DCL. Dcl contains SQL
commands That's allows you to give access to specific
user to your database, or two tables or
schemas and so on. So here we have two
commands, grants, you could use grants to give someone access to
your objects in databases or revoke to remove such axis
from specific user. Okay, so now to the large
group that we have, the Transaction
Control Language, TCL. In TCL, you will find the SQL commands that's
going to help you to manage that database
transactions in order to maintain
integrity of your data. So here we have commands
like commits in order to save the changes
in your database, rollback to restore
the database. The last commit or to
the last saved point. If you have some errors, you could use that safe point. You can define same points
in the transactions, which you can use it
later to roll back. Alright, so now
about those names, did the l do QL, DCL, TCL, and so on. You don't have to memorize them. Maybe only the important one is the LA sometimes
here in the project. So if someone says, I will be creating
some DDL scripts, that means he or she, I'm going to create a
scale statesman's to change the structure
of the database, like creating a new table
or dropping something. Alright, so in our
SQL tutorials, we will be focusing on the first three groups
of the SQL commands. We will start with
the most famous one, the SQL select statements. And after that,
we're going to deal with all those scripts. And finally, I'm
going to explain, insert, delete, and update. Alright, so that was the
main types of SQL commands. Next you will learn the basic
limits of SQL statements.
8. SQL | The Elements of SQL Statements: Alright, so now let's start with the basics a want you to
understand at the start, the basic elements inside
each SQL statements. We have over here very
simple select statements. Don't worry about the content. I will be explaining that later. So the whole text that's going to be
sent to the database, we're gonna call
it SQL statements, or sometimes we call it query if it is a
select statements. So it doesn't matter whether
you are retrieving data from database or creating new
table or updating content, we're going to always
call it SQL statements. Okay, so now let's talk about the components inside
our SQL statement. Let's start with the first
line over here, the green one, we call it SQL commenced
the SQL command. You could write anything
you want and once you hit Execute or the
whole SQL statements, the database just
going to ignore it. That means nothing
going to happen. There is some benefits
of SQL commands. We could use it to
describe our code. So later going to be
easier to read it. And because the
database going to ignore it and nothing's
going to happen, reuse it to deactivate
part of our code, e.g. if I don't want to use
such a filter over here, I could make it as a comment and the database
will not execute it. Okay, so now SQL statements are usually divided into
different parts. We call them clauses. Each part is responsible
for specific action. In our example over here, we have three clauses, select from and where clouds and each of them has
its own unique function, e.g. in select, you can
list the names of the columns that
you want in from. You're going to call the tables where you're going to
define the filters. So as you can see, SQL is really nicely
splitted after functions, which makes it really
easy to read and easy to write and make the whole SQL
language a very easy one. Okay, so next, as you
might already notice, we have those blue words, we call them keywords. In our example, we have four
keywords, select from where, and those keywords are
predefined and reserved in SQL, that means you cannot use them as a table name
or column name. In my SQL, we have
over 900 keywords. We will not go
through all of them. I'm just going to focus in the tutorials on the
most used keywords. The link in the description, you will see a list of all keywords that
we have in MySQL. Alright? Okay, so now let's
take the next element. We have identifiers. Identifiers are
any name that you give to any object in
your database, e.g. a. Table name, column name, even the database name itself, it is identifier in our example here we
have four column names. Firstname, LastName,
country, and score. And we have as well here
table name called customers. All of those stuffs,
they are identifier's. Alright, so now to the
last element that we have, we call them operators. In SQL, there is many
different operators. They have different
shapes and forms, e.g. they could be simple, like what we have
here equals smaller, or they could be keywords, e.g. and we call it as an operator. So as I said, in SQL, there is different
SQL operators, like there is arithmetic
operators plus and minus. There is comparison
operators as our example, equal and smaller and so on. Alright, so that's why it's the basic elements
inside SQL statements. So drama, over here, we have the whole text. We call it SQL statements, the green ones, we
call it comments. In SQL, we have different
classes, different parts. The blue one, they
are the keywords. We have our name. So that's what gives
in the database. We call them identifier's. And at the ends we have
operators in our statements. Alright everyone, so with that, we have finished the first
chapter of SQL course. We have now a lot
of knowledge about the SQL basics and concepts. In the next chapter, we
will start preparing your environments so we
can start practicing SQL. And we will start by downloading
and installing MySQL.
9. SQL | Download & Install MySQL: Now if you don't have
already MySQL Install, then you can follow me. I'm going to show
you step-by-step, how are we going to download and install MySQL in Windows? This is so important so you can practice and run the
tutorials at your computer. Let's start by
downloading my screen. Okay, Let's go to our browser. We will go to the official
website of MySQL, mysql.com. You will find your downloads. Click on that, then scroll down until you find MySQL
community downloads. Click on it. You'll have a bunch
of installers. The one that we need is MySQL Installer for
Windows. Let's go there. Here you have two options, smaller one and bigger ones. So the small, it's like it got download some packages
as you install MySQL. Or you can download the
whole package at the start. So I recommend you to
go with the bigger one. So we have everything
downloaded at the start. Click on download this page. It asks you to login to
create new accounts. It's not necessarily
for the tutorial, so you can skip that. So I'm gonna go with no sacs. Just start my downloads. That's going to now start
downloading the installer. But because I already done it, I don't want to ask
now at the time, but I'm gonna go to downloads and I'm going to start
the installation. Okay, let's start
now the installer, I'm going to click
on it. Press yes. And now we are at the first
step of the installations. Before we proceed, I'm going
to tell you there will be lots of steps 30 I think we're just going to
press start next, finished. Yes, and so on. We will not change a
lot of configurations. Maybe we're going to put some
password, but that's it. So it's really easy. Let's start with the first step. I'm going to tell us see e.g. developer, server or
client and so on. We will stay with a
developer default. So click Next. After that is going
to check the path. We're going to stay
with the defaults. Next. Yes, I'm sure. So here it's going to
check the requirements. They will do a lot of steps like this, checking the requirements. So we stay with the
defaults for SES. And now I'm going
to show you all the packages that's gonna be installed so we will
not change anything. Let's everything
to be downloaded. So now I'm going to click
Execute and it's going to start installing all of
those components at maybe see one-by-one. Alright, so now we have all
the products installed. We will click on Next. Then we have some
product configurations. Just click Next. And now you can see
about the networking. Well, the most important thing here is to know that we have the following port number
or our local database, but we will not change anything. You're going to
leave it like this. Then click Next. We're going to stay with
the recommended settings for the authentications. Click Next. And now we have to
set up finally, the password for our rod user, or we call it an admin
user for the database. This is very important to memorize or dried
down somewhere. So now I'm going to
give our admin user the following password week. So next, we will stay with
their commended stuff, not going to change anything. And we can click now execute
to apply our configurations. Okay, after all
configurations are completed, we can click on Finish. After that, there'll be
more configurations. Next. Don't change anything. We're going to stay with
those configurations. We're going to click on Finish. After that, some more
configurations or finish, okay, now we're going to test our connection to
the database server. You see here the
username is root, and we're going to type
in the password that we gave previously for
the admin user. So I'm gonna give here the
passwords and click check. So if you get it like here,
Connection succeeded. That means we are successfully connected to our SQL database
and everything is fine. So let's click Next up like
configurations k, x cubed. So everything is green. Click Finish. We have more configurations. Guess what next? Alright, installation completed. So let's click now
one more finish. After the installation
is completed, it's going to start like
MySQL workbench for you and as well another shell
scripts. Let's check here. So we don't need this
one, you could close it. We will stay with
the MySQL Workbench. This is exactly what we
need for the tutorials. So you can see over here, local instance might squeal AT this is your local
database at your machine. So we're gonna login and try to see whether
everything is fine. You see here the
admin user roads and we type the password we
gave in the installation. This is mine. Click Okay. And now I'm inside my database. If you aren't exactly this step, that means you downloaded, installed, and locked into
your database successfully. So congrats. Alright, so with that,
we have downloaded and installed MySQL
successfully on our system. Next, I'm going to take you in a very quick tour in
the interface of MySQL.
10. SQL | Tour in the Interface of MySQL Workbench: I would like to give you
now a real quick overview of the interface of
MySQL Workbench. Because I remember
when I first started using such a database
applications, it was little bit confusing, overwhelming having
all those panels, options, and toolbars. But actually it was
not that heart. I'm not going to go and
explain every single detail, but instead, I will give you a general overview
of the interface. If you need more
details about the tool, visit my SQL manual. I will leave the link
in the description. So now let's start explaining the main sections
in MySQL Workbench. Alright, let's start on
the left side we have here very important
sections called Navigator. And in the navigator you can see two tabs, schemas
and administration. As a default, you will
be landed in the schema. So you can see in the schema, it allows you to navigate or browse through your
database objects. E.g. I. Can see here, I have three
databases as default. We got it from the installation. So if I want to see inside
this database called word, I'm going to double-click on it and I'm going to see the tables, views, stored procedures
and functions. So I can router furthermore, I want to see what is
inside the tables. We will see that we
have three tables, city, country, and
country language. So I can start, okay, I have three tables
in the database. Let's see now which columns
contain those tables. I can click on the
city and expand. And I will see, okay, I have the following columns, ID, name, and so on. So with the schema navigator, you can navigate through your database to understand
the contents of it. Let's go now to the second
tab administrations. Here you will find
a lot of info, a lot of tools to manage
your SQL Server, e.g. you can check the server status, double-click on it, you'll
see the right side here. Several status is running or you can manage
the connections, many users and so on. It is interesting if
you're going to be like database administrator to
understand all those stuff, we are now learning SQL
and it is different topic. Now, let's go back to the schema where we can
browse our databases. Alright, let's close
this one over here. I don't need it. Go away. Right? Next we have the toolbar. We have two toolbars. The first one called
main toolbar. It is like the most
frequently used functions in SQL, e.g. to create a new SQL statements or to create a new
schema or database, creates a new table and you view new stored
procedures and so on. So it gives you
like a quick access to create the new stuff
in the main toolbar. The toolbar, it is over here. It is the query toolbar. It contains all the
actions that are related to the query that you are writing
in the Query Editor. And the most important
one is the execution. So once you write your
SQL statements over here, you click on execution and it will be run on the database. You have some other
options, e.g. to save the SQL statements or to open one that's already
saved, and so on. Alright, Next we have
very important sections. It's called the Query Editor. Here we will write our SQL statements and
queries and so on. It is our main place
where we will work. E.g. I'm gonna write the
following statement. Select star from Tuesday. Don't worry about the syntax. I will be explaining
everything about the select statements
in the next tutorials. So now let's hit
the run or execute. After we run the query, you will see that we
have here a new section. It's called the result grids. Here you will find
the results that data that are returned
from the database after we executed the query or the select statements and the data is presented
as a table form. Underneath that, you will
find another sections. It's called the outputs. Let me just make it
bigger little bit. So in this section you will
find a lot of information. It's like logs. So you can see the
execution time, how long it took the server
to execute your query. You can see as well
whether it was successful or you have some problems and the syntax, or
you have some errors. So you can see it
over here and you can see the error message
as well and so on. Okay, Now if you go to
the right side over here, we will find another section. It's called SQL additions. It is like a tool from my SQL that's gives you descriptions
for the SQL statements, syntax, they usage, and
recommendations and so on. I usually hide it to save some space in the application
by clicking over here. It's really up to you. It's bursa references. Alright, that's why
the main sections of MySQL Workbench and really
need it in the SQL tutorials. So I hope it helps. Don't worry about it. You need some more time
using such applications in order to understand them and
to navigate through them. And it will be
less overwhelming. Alright, so with that,
we have learned how to navigate through the
MySQL interface. And next we are going to install the database for practicing.
11. SQL | Install the Course Database: Alright, so, so far
we have installed MySQL application
locally at our commuter. As a next step, we're
going to create a tutorial database
for this SQL series. I've prepared spatial database just for practicing
and tutorial purposes. In this tutorial database, we will have three
tables with few data. So all our next
tutorials will be based on this tutorial database. What you're gonna do, I'm
gonna show you some tasks. And we get to try to
solve those tasks using SQL codes on top of our
tutorial database as an x, I'm going to show
you step-by-step how to create our
tutorial database. Okay, so now the first step is that we're going to go to
the video description. And there you will find
the link to my website. And with that, you will find
our SQL tutorial database. So it will look
something like this. So this is one big code
in SQL around 53 rows. So you don't have to understand all those stuff at the starts. At, after you
finish the acidity, you will understand what
we have done over here. So you will understand
how to create a new database tables, how to insert a new
data, and so on. So what we're gonna do is now
is just to copy the script. So in order to do that, you can go over here
and click Copy or Gus, go and select
everything and copy it. So once we covered our
tutorial database scripts, Reagan to go to our MySQL
database and run that. Alright, so step number two, go back to my SQL Workbench. And there we're going
to execute our code. So we're going to open
a new tab scale editor. And here we're going
to paste our code. So it is around 53
rows in the codes. And we're going to hit Run. So once we run, we have to validate whether
everything went perfectly. So if you check the
left side over here, you will find, okay, we have three databases. So where is my tutorial
database we just installed? In order to see that you're
going to hit on refreshed. Once you hit refresh, you will see, okay, we have now our tutorial
Database, DB SQL tutorial. In order now to browse
our new database, we're going to do the following. Just double-click on it
and then go to the tables. And there you will
find our three tables. So there we have the table, customers, employees,
and orders. Okay, so now let's
check whether we have all the data in our
tutorial database. In order to do that, we can open a new tab. Just follow me with those
steps, all the commands, you can explain them
later in the tutorials. So I'm just going to retrieve all the informations from
each table to check, do we have all the data? So select star from customers. So this guy retrieve the data
from the table customers. And as you can see, we have here table called
customers with five customers. We have Maria joined
George Martin and Peter. And in this table we are storing the general information
about each customer, like the FirstName, LastName,
country, and score. Okay, so now let's
check one more table. Let's check the orders. Instead of customers,
I'm going to replace it with orders and click Execute. Though that we're
going to see that we have table orders that stores all the orders that
are placed for our customers. So we can see over here we have the customer ID and we have the order ID and the date when the order is placed
and the quantity. If you want to see the
formation of the orders, we're going to check
the table orders. If we're going to
see that information about the customers, we're going to check the
table customers and so on. So if you have done all these three steps and
you have checked the data, that means you have now our tutorial database installed
at your local machine. And we could proceed
with our tutorial. Alright, so with that we
have a database with data. And before we start
writing our SQL code, we have to learn
how to style it.
12. SQL | Guide to SQL Coding Style: Okay, so now before we
have hands-on and you start learning how
to code in SQL. I really have to mention this. When you start learning any
new programming language, it's really not enough
to learn how to code it. But also you need to learn
many other stuff, e.g. how to solve the task was few
lines without making stuff complicated or how to write the code that delivers
good performance. And finally, and
the most important, how to write code
that looks good, that is easy for you to
read and for others. So if you are
working on projects, you will notice
that developers has always different opinions
about how to style it code. But all of them will
agree that the code should be readable and
following some styling guide. So you might ask me now, Barra, do I really need
to style my code? Is it not enough that my
code is working correctly? Well, no, there's two
reasons for that. If you are working
on team projects, sometimes your code should
be reviewed from others. And if your code
is hard to read, you will give them hard time reading your code and even end up that they gonna rewrite
your code in order to read it. Another reason that
if you find out there's some errors or some
problems in your code, you will have hard time
searching for the error to find out in which line you
have the problem. So especially if you
are a beginner in SQL or in any
programming language, at the start, you will not pay attention for the
styling guides. You will just makes
sure that you learn the codes and
the statements. So my advice here, don't develop any bad
habits at the stars because later gonna be
really hard to break them. Alright guys and girls, I want to share with you
now my three golden rules that I always follow when
I start writing SQL codes. Let's check this
example over here. It's very simple
statements, query, select statements where at
the start to be honest, I had really hard time
understanding what is going on. So let's try to make it beauty
following the three rules. Rule number one, always add new lines for keywords and
as well for each column. So let's start doing that. We have here the
select statement. So let's add a new
lines for each column. I'm going to do that. So all of those stuffs are new columns or new lines for each column. And as well from we have
it here as a new line, so that's okay, I joined. We could add new line
for it on as well. So just adding new lines for each key words as well
here for the end. So as you can see, it's already looks better. I added a new lines for each
keyword and for each column. Rule number two, let's make all those keywords as uppercase. So let's do that. Select is lowercase,
Let's make it uppercase. The same goes for from join. Let's make everything
as uppercase. Why we do that? It's because it's
easier to read what is keyword and what is other
stuff like I didn't do, it fires operators and so on. So it's much easier to read. So rule number three
is that we're going to go and add some
whitespaces around. So let's check that. And in the wearer statements, we could like splits this
condition with whitespaces. It's just easier
to read if you add whitespaces as well here on
the condition of the join, we could add whitespaces. So as you can see,
we can read it better as everything
like stuck together. Now as well for the columns, I always add a tab for it. So now that's it. Now, I have applied by three
rules and you could see, it's really much easier to read. We can see here or key select
from join where, and so on. I could read it
through the easier compared to the first one. Alright, so now let's look at both of the script side-by-side. Can you see the differences? Which one is more readable? It's straightforward. Script with a style, has a proper format that's helps you and others to read it easily and as well to find erodes and problems
if you have any. Alright guys, so with that, we have now my SQL
Server database and data up and running on RPC. So everything is ready
to start practicing SQL. And now in the next
chapter you will find, will begin to use SQL syntax
to query the database and tables using their very
famous select statement.
13. SQL | SELECT Statement: Alright, so now we're
going to focus on start on the select commands. So here gonna be our focus. We're going to learn
how to query our data. And this is going to take almost 80 per cent
of our tutorials because SQL is all about
how to query our data. Then other on our data, we're going to talk about
the data manipulations and data definitions at the end. So now let's start with
the select command. Alright, so before we start writing our first
select statements, I want to mention the following. And that's in select statement. There's a lot of clauses. This is not really bad. This gives like a squeal, dynamic and easy way to use SQL. And each of those clauses has his own definition
and own function, which makes it
really easy to use. So we have the
select in order to select our columns from, to select the tables
that we need. Joins in order to
connect two tables together where in order
to filter our data, groupBy to aggregate the data. Having is another way
to filter our data. Orderby is to list our results and limit is
just to limit our results. So those clouds is
don't worry about them. I'm going to explain all
of them step-by-step with examples and task and
everything and the end, you can understand all of them. One more very
important aspect to understand in SQL statements is that the order of those
clauses are very important. So e.g. I. Cannot use at the start from then we
write down the select. So this order is very strict, and if you switch between them, you will get immediately
and error in SQL. So that's means pay attention to the order
of those clauses. Don't miss between them. You need to follow
those rules in order to get like your query executed
in SQL without in euros. Alright, so now the first
thing that we need to learn is how to fit our data
from the database, how to retrieve
all those records or rows from our tables. And to do so, we use the most
fundamental SQL statements. We call it select statements
or sometimes select query. So now in order to understand all those SQL statements like
select where joined from, I will be giving
you like one task. Then we're going
to try to figure out together how are we going to solve it using our
tutorial database? In our tutorials database, we have two tables,
customers and orders. In the customers table, we have five customers. And in the orders we
have four orders. Alright, so let's start
with the first task. Retrieve all data and
columns from customers. So that means our focus here on the customer staple and all
data that's means or rows. So we need everything, rows and all columns. So now before we start
writing our first query, we need to make sure that we are selecting the right database. As you install MySQL Workbench, you will be getting
some default databases. And after that, we installed our database for the tutorials. So to make sure that we are selecting the right
one that we need, either you're going to
double-click on it, or you can write this statement. So we're going to write
use then the database name, DB SQL tutorial. And then run. With that, we make
sure that we are on the right database so we
don't get any errors. Alright, so now let's try
our query for the task. So we need all the data
from the customers. So the first thing
that we specify in the SQL statements for the
query is select keywords. After that, since we
said all the columns, we're going to use star. Star means all the columns
inside this table. After that, we need to tell the database which
table that we need, those since we need
the customers, we're going to select
the table customers. So we're going to
say from customers. So we have now the
query that's going to select all columns
from the table. And here we don't have any
like filters or anything. So this is the
basics form of SQL. Let's hit Run. And as you can see here, now, we have the results. We have all five
customers from the table, customers and don't forget, in SQL, the order
is very important. So it always start with select
then comes from clauses. Because if you do the way round, you will get an error. So make sure that
you are getting the right order while you are
writing and SQL statements. Let's do another task
were to say, okay, I want to see all the
data from orders. So let's do that. Old data or columns, that means select star from. And now our table is orders. So I'm going to select
that table orders here and then execute. And as you can see now, we can see that database
retrieve orders. And that's right,
because this is all what we have
in our database. Alright, so now you
might be saying, I'm not really interested in all the columns
from my table. I want to specify few columns
from the table to retrieve. So let's say we have
the following task. Retrieve only the first name and the country
of all customers. So here's the difference
from the previous one is that we don't need
all the columns, we just need your columns. So let's see how
we can solve that. I'm going to remove this
one and start with Select. And now I cannot use star because I don't want to
have all the columns. We are interested
on the firstName. So we write down
firstname, then comma. The second one is country. And now we need to tell
the database from which table, so from customers. And let's run. As you can see here now
we have only two columns, first name and country, and we don't see here
the other columns like customer ID or score. So with that, we selected only two columns without using star and we
solve the task. Okay, So now just
to understand how the database are
reacting to our query, I'm going to show you now
step-by-step what is going on in the database once
you query this statement? So the database starts
from the table. So we said from customers, that means the
database is going to focus on the customers table. Then it's going to check, okay, which column we need. So we say firstName, country. And since in our SQL statements
there is no filters, it's going to select
all the data. So it's going to select
everything from the table. And as well for countries. And that's how the database
implemented our query. Alright, so with that, we have learned how to
use the select statement. Next, we're going to talk
about how to retrieve unique values using
the distinct.
14. SQL | DISTINCT: Alright, so the select
statement as a default, it will not remove any WE
kits from the results. So sometimes you
might be in situation where you have some duplicates inside your tables
and you want to remove them from the results. So removing duplicates from
results, not from table. In order to do that, to remove those duplicates, reuse in there
select statements, a keyword called distinct. So in order to understand that, let's have the following tasks. List all countries of all
customers without duplicates. Alright, so now let's
try to figure out how are we going to
solve this task. As you can see, we
need the customers. That means we're going to
focus on the table customers. And we need all the countries. That means we need only
one column called country. So let's do a basic query. We're going to start
always with select. The column that we
need calls country, but we're going to
write down country. Then from our table
is customers. So now let's just
check whether there is an WE kits and see the results. So x cubed. Now we can see the results. Germany, USA, UK, Germany, USA. As you can see,
there is duplicates. We have Germany twice
and as will the same, we have u is a twice. So now the task is saying
without having any duplicates. So in order to solve that, we can type distinct
exactly after the select. But we're going to use
distinct over here. And this keyword always
comes after selects. Only by doing that, it's like magic words. It's going to remove
all the duplicates. So let's check that. So x cubed, as you can see, now the list contains
only unique entries. We have Germany, only ones, USA as well, and UK as well. So here we have a unique
list of all countries, of all customers, and
we solve the task. Alright, so now in order
to understand distinct, I'm going to show you how the database is
implementing our query. So we said in our query, we need the data from customers. So the database is going to
focus on the table customers. And we sit as well. We need only one
column called country, so the database can
select it in the results. We said, okay, we need all data, but in distinct without
having any duplicates. The database can start, okay, Germany, it's not in the result. It's going to put it there. Usa. We don't have it and the result
is going to put it there. Uk the same. We don't have it in
the list and a booted, but now it comes to
Germany again, said, okay, we have it already, so it will not include
it in the list. And same goes for USA. We have the use aorta here. It will not included
in the list. And with that, we will have our unique list
of all countries. Alright, so that's all
about the distinct. And next we are going
to learn how to sort our data using order BY.
15. SQL | ORDER BY: Alright, guys and girls. So now once you start
using select statements in order to retrieve your
data from your database, the results that you
are getting is not sorted in any particular order. That means that the
DBMS or database is sending that data back to
you in unspecified order. So now if you want
to apply some rules or you want to sort the results, we could use the
clouds order byte. So now, in order to
understand the order BY, you're going to check
the following task. Retrieve all the customers
where the results are sorted by scores and the
smallest should be first. So now let's try to figure
out how we're going to write the SQL statements in order
to solve these tasks. Now, since we need
the customers, that means we are focusing
on the table customers. Let's try it. Our select statement first. So select, there is no
specification about the columns. I'm going to use a
star from customers. So let's run that and see, as you can see, we have
all the customers. But as you can see, it is not sorted by the score. The task is distorted by the
score the smallest first, then come the highest. In order to do that,
we're going to use the keyword order BY. So let's have a new line. Thereby. After that, we need
to specify the column that we're going to
use to sort our data. Or the task says it should
be sorted by score. That means our column is score, the column name score. Now we have here two options, how we can sort our data. We have two ways,
ascending and descending. In the task it says it
should be sorted by score, the smallest first. That means we need
to use ascending. In SQL, we have the keyword ASC. That's means it is ascending. So now we have the Order By
clause and we should be fine. Let's run the query. Now if we check the results, you already might notice
that the result is sorted differently from the
standard sets means we have different sorting
now after the score. So the first one is null, because null, null considered
to be the smallest. Inserting. After that, we have 350 is the smallest score from
all those customers. Then comes the higher
and higher and so on. So now we first are reboot, rule how to sort our data and we have solution
for our task. One more thing to
notice is that in SQL, the default sorting in
order by is ascending. That means if I go
here and remove the ask this keyword and
start the query again, I will get exactly
the same results because don't specify anything
after the column name, the default gonna be ascending. Okay, so now let's consider one more quick task and
it says almost the same. Retrieve all customers and the results should
be sorted by score. But this time the
highest should be first. So that means we need
to use the method of descending the highest
fare than the smallest. So that means we
have the same query. We don't have to
change anything. But now after the column name, if I leave it empty,
it's gonna be ascending. But this time we need
to be using descending. So we're going to use
this keyword disk, that means descending. So let's run this query. So now let's check the result. We can see already that the list is sorted
the way around. So now we have the
first three card with the highest score. John has 900, and
it is the highest, then come the
smallest and so on. So now we are
sorting the list or the result with the
descending way. Alright, so now using order BY sometimes it gets a little
bit more complicated. If you are using not
only one column, maybe different columns
to sort your results. Especially if you have a lot
of kids inside your data, using one column
will not help you. You're gonna be in W using multi columns in the order by in order to understand that. So we're going to have
the following task. Retrieve all the customers
were the result is sorted by country in
alphabetical order, and then by score with
the highest is first. So let's try to figure
out how to write the SQL for that step-by-step. So now I'm going to remove
everything over here. I write it down order by the
first one called country. So the column we
need is country, alphabetical order, that
means it is ascending. So we could leave
it as a default or we could write
ask, doesn't matter. We're going to have
the same result. So now let's check
the result for that. So now as you can see that we
have the result is already sorted by country in ascending way that
everything is fine. So we have Germany
fairs then you can use a, it's already sorted, but that is not enough because
the task it says, okay, after that, you need to sort it by the score, the highest fairs. If you take now
here the example, those to customers
Marty and Martin. Both of them comes from Germany, but Maria comes as spheres. And even though that
she has lower score. So that means after we
started with the country, we need to sort
again those scores. So in order to do that, we're going to put
here comma and then. Write down the score. Then the option here is gonna be descending the highest first. So this, so that means we could use in the order
by year two columns. For each column, we could use different methods in
order how to sort it. So now let's run this. And as you can see here
again, that's okay. We have it sorted by country, but now Martin comes
first because he is higher score than Maria. And this is exactly
how we're going to sort the data
using multi columns. One more note about order by it that we could use instead
of the column name, the position of the column. So if you can see over here that the country had
the position four. So this is the first column, second, third,
fourth, and fifth. That means that country
had the position of four. So instead of writing country, I could write four. Here the score is the
last one, is the fifth. So this is like an easy
way how to sort the data. I'll use orderBy and
if I run this query, I will get exactly
the same results. But I really don't
recommend that. Because if you change any
structure of your data, like let's say the country
will be the position to underscore gonna be
the position three. Then after you change the
structure of your data, you have to go and
edit your query. That means I need to
change those numbers. Again. That's really bad because
you might forget about it. So if you write the name, it doesn't matter any change. It's going to happen on that
schema or on the table. Your query can deliver the same results and
using the numbers, you need to adjust this. So I really don't recommend
using those numbers. The bitter is to write the
full name of the column. Alright, so now in order to
understand the order by, I'm going to show you
step-by-step what the database is doing in order to
execute our statements. So first, it's going
to choose the table. Our table is that customers, we are using the star, that's means can select
all the columns, are going to put
it in the results. But now, once we are not
using anywhere or filters, you're going to
select all the data. But it notice that
there is order BY, so it can sort the
results by each column. So the first column
is the country. So it's going to sort it
by the country first. The first, the first
customer going to come here, Germany as well, Martin. Then after that comes the UK. Sort it over here. And then after that
come drawn from USA, it starts sorting the results. So we can have here that
the country is sorted. And this is the first step. The next step is gonna goes to the second column in the
order by in the score. So it's going to sort
the results again. So it's going to check
those to our customers. It's gonna see, okay, Martin has higher score and
it's going to switch it. So let me just do it like this. And Martin going to be
the first on the list. Second we have UK,
so that's okay. Then we have those two. We have 900 and null. Null is the smallest
and it's ok. So now this is how the database is sorting
using the order of Y. Alright, so that's what's
it for this chapter. We have learned how to
query our data using the select statements
and how to sort the results using
order BY clause. In the next chapter, we're going to
learn how to filter our data using the where clause. Where are we going to learn
many important operators.
16. SQL | WHERE: Alright guys and girls. So now we have learned
how to retrieve all our data from
the database using very basic keywords select from. As a next, we need
to learn how to filter our data
using whereClause, because in real-world scenarios, you are not interested in
all records in the tables. So usually you will
be interested in only the rows that fulfilling
a certain condition. E.g. we don't need all the
customers and their results. We need only the
customers that come from certain country or have
like specific score. So in order to understand that, let's check a very simple task. The task says, list
only German customers. So that means we are not
interested in all customers. We need to see in the results. Only the customers thus
comes from Germany. Okay, so now let's try to
figure out how are we going to solve this task using SQL query. In the task we will be
focusing on the customers. That means we will be
querying the customers table. And since there is no
specification about the columns, we could go and retrieve
all the columns. Let's try to write now the
SQL statement for that. Select as usual. Then no specifications
about the columns. We're going to select
everything we use. Star from our table
is customers. And let's run this
and see, as usual, we have all the data, all the customers from Germany, from USA, UK and so on. But the task says only
the German customers. That's means we have
to do some filters. Now, in order to do that, we're going to use the
weird clouds and usually we put it immediately
after from, alright, so now we need to
write down the keyword where After the way we need
to specify our condition, the condition should be
based on the countries. That means country should
be equal to Germany. So we write down now
the column name, country equal operator. And now here we need to
enter the value that is exactly like it's written
inside the database. Jeremy, like this. We
write down Germany. So let's start now the
execution and see the result. As you can see, we don't
have all the customers. We have only two customers
That's fulfill this condition. Maria and Martin. Other customers like
John, George and bitter, they all don't fulfill the condition and they are excluded from the
results, right? So as you can see, SQL is pretty easy to write
Android, like take these, select all columns from customers where the
customers country should be equal to Germany. So it's really easy
to read it using English words and in
the logical order. Okay, Let's have now
another quick task. It says select customers whose score is greater
than the 500s. So it's based on the same table, so we will not change
here a lot of stuff. The only part that's
changed is the condition. So now we're going
to remove this year. Our condition here is
based on the score. So we have the column
score operator is not anymore equal,
it should be greater. So we need another operator and the value is five hundreds. So we write down your 500. Let's execute that. Now we can see the customers who score is greater than 500. As you can see,
it's pretty easy to use the where statement. Alright, so now in order to
understand the where clause, I'm going to show you
step-by-step what the database is doing once we execute our query. So that database gonna
check which table, so it's going to focus
on the customers. Then I'm going to
check which columns do we need as we
write down the star. That means the database going
to select all the columns and their results are then
the database can check, okay, there is filter, that means not all the data we should be on the results,
so it's going to check it. So now the first three chords is going to check
the score over here. The score is 350, that means it is not
greater than 500. It will not include
it in the result. The next one is
greater than 500. That means it's going to
take it the next customer, the same, fulfill the condition. Oops, I need to write
it down over here. Alright, now, the
first customer, 500, it is not greater or equal, it's only greater than 500, that means it will
not consider it. And the last one, it's null. That means it's empty. It will not feel
for the condition. That means we have
only two customers and that's how where is working
inside the database. Alright guys, so in SQL there
is many different types of operators that you
could use inside the where clause in order
to filter your data. In SQL, there are
splitted into two groups. On the left side we have
the comparison operators, and on the right side we
have the logical operators, the compressor and predators. You could use it in order
to compare two values, e.g. we have the equal, not equal, greater
than, less, than, greater than or equal to, less than or equal to
the logical operators you could use it
once you want to combine two different
conditions. And as a result, you're going to
get true or false. E.g. we have an operator, it returns true if both of
the conditions are true, we have or return true. If one of the
conditions is true, then we have not
in-between lie and so on. So in the previous examples
in the where clause, I showed you two
conversion operator, it was the equal
and greater than. So as a next, I'm going to go through all of them
in order to show you how you could use them inside the query
and some examples. So you don't worry about it. Alright, so that's what's
it for the whereClause. Next we're going to talk about
the comparison operators.
17. SQL | Comparison Operators: =, >, <, >=, <=, !=: Alright, now we're
going to focus on the comparison operators and learn how to build up our
conditions inside where clouds. The conversion of
birth is, as I said, it is used in order to
compare two values and it is the most basic way how to
filter data using SQL. Okay, so now in order
to understand them, let's have the following tasks. First, find all customers
whose score is less than 500. So that means we're
going to focus on the customer's table and there is no specification
about the columns. We're going to use Select
star from customers. So now let's run this. As you can see, we have
all the customers, but we need to filter the
data score less than 500. So we're going to use
the where clause. The column is score,
the less operator. And then we're
going to type 500. So let's check the
results and draw on it. So we have only one customers whose score is less than 500. So now in order to
understand why we had only one customer
other results, I'm going to show you
what the database has done once we executed our query. So we said select
star from customers. The database is going to
focus on that customers. We said star, that means we need all the columns,
add our results. And then since we
have wear gloves, are going to filter the data. So it's going to go through
all the records and tried to find whether its fulfill
the condition or not. So I'm going to use the like
and dislike what term to say with those is true or false. So the first customer hear
score is less than 500. That means it's gonna be shown in the result because it
fulfill their condition. Then we have the next one. Score is 900. It is not less than 500, so that's means false. The next one the same 750, it is not less than 500. The next one is interesting. It is exactly 500, but since the conditions, it says less than 500, it not fulfill the condition, then the null is false. So that's why we had only one
customers at their results. Okay, so now let's add
another task and it says, find all customers
whose score is less than or equal to 500. So almost the same, but we have here as well the customers that are
equal to five hundreds. So let's check that we
can have the same query, so will not change anything
over here, only the operator. So we need the less than, so it's going to stay like this, but we need as well equal to. So there's another operator
called less than equal to, and it looks like this. So we have them both like this. And let's worry and
see what the result. So as you can see now we have the customer number for Martin. He has score 500. And now it should be
shown on the result. So we have the first one, Maria, less than 500s and
we have Martin. It has exactly like the 500. So this is the less
than equal to. So as you can see,
it's pretty simple. Let's go with another operator
with the following task. Find all customers
whose score is higher or equal
to five hundreds. So that's means it's
almost the same, but we need to use other
operator greater than equal to. So it looks like this, greater or higher than equal. And let's check the result. So as you can see here, now we have all those
scores are higher than 500. So we have joined with 900. We have George with 750, and Martin stays here because
his core is equal to 500. So as you can see,
it's really easy. Alright, so now we have
one more last task. It says, find all
non-German customers. So let's try to solve that. We're going to stay with
the table customers. So select star from customers. And we need to filter the data using NAT score
but that country. So we're going to dive
now here country. And since it says
non-German customers, that means the country should
not be equal to Germany. So the not equal operator, it looks like this. And then we need
the value Germany. So with this query
you are saying, okay, give me all the customers whose country is not
equal to Germany. So let's run that. And as you can see here, we don't have a country called
Germany and the results. And you could see like or have the same result using
this operator as well. It tastes as well, not equal. So if I run that, we're going to get
the same results. So you could use
either one of them. There is no difference
between them. Okay, so now let's see how
the database solve that. We say select star
from customer. That means that the three
is going to focus on the customers star means
all the columns as usual. We're going to put it over here. We have under where it says
country not equal Germany. So the database is
going to focus on this column or the condition. So let's see the first customer, the country equal to Germany. So that's mean it's false. We will not see it as a result. The next one, the country
is not equal to Germany, so that is positive. We're going to see
it at the results. The next one is the same. The country is not
equal to Germany. We will see it as
well as the results. And the first customer, the country is equal to Germany. So that's means it is false. We will not see it
at the results. And the last one, that country is not
equal to Germany, so it is true, we will see its result. So that's why we saw three
customers at their results. Alright, so now we've covered all those operators
inside that comparisons. They are pretty easy. They're always like
compare two values. And I would suggest
that you go and play with them until you
understand how they work. But as an x, we're
going to go and start working on the
logical operators. They are like little
bit more difficult, so don't worry about it. I'm going to explain that in details and examples
and everything. But they are very
important using SQL because you will be
end up using them a lot. Alright, so that was it for
the first group of operators. Next, we can talk about
that other group, the logical operators
and or not.
18. SQL | Logical Operators: AND, OR, NOT: Alright guys, so
now we're going to talk about the second group of operators that you could use
inside the where clause, and they called the
logical operators. We will focus on those
three bad boys and or nuts. In the previous examples, you'll learn how to filter your data using
only one condition. But in real life scenarios, things gets more complicated
where you have to combine the results of two
or more conditions. And in order to do that, you could use the
operators and, or. Okay, So now let's start
with the first operator. The operator, it
says the following. It returns true only if
both conditions are true, otherwise can be false. So let's say we
have condition a, condition P and we want
to combine them using. And. So the first situation
we have in the condition a true and the condition
B we have true. If you do the ads,
we will get as well through because it's
fulfill the requirements. So both conditions are true. We will get through. Let's have the second scenario, condition a as well, true. But in condition
B we have false. Here. Not both of them are true and we will get
the result false. Now the way around
the condition a has false and
condition B has true. Not both of them are true, that means the result's
going to be false. And the last scenario where you have both of them are false. As a result, you're
going to get false. So that means the AND
operator is really strict. Both of the conditions
should be true in order to get true. Otherwise, it's going
to be always false. Okay, Let's jump
to the next one. We have the OR operator. It says it returns true if one
of the conditions is true. So that means the OR operator. It can not be happy
if you have one of those conditions was
true to give you true. Otherwise, it's going
to give you false. So let's take again the
same example we have here, condition a, condition B, but now we're going to
apply that or we have in the first scenario true
and a true at the B, it will further requirements. Both of them are true. So that means in the
order we have true. The next one we have
added a true or false. So now it says at least
one should be true. So that means with
the oral you're going to get as well through because you have it
here as a, it is true. So the next scenario
where it is the opposite, where you have a
false and a true, It's fulfill the requirements. At least one of them is
true to give you true. But only the last scenario
where you both are false. With this scenario
you will get false. So as you can see, the
orbiter is less strike that. And it's gonna be
happy if you have somewhere through to give you a true and you will
get more results. Okay, Let's move to the
last one, the not operator. It says it's going to reverse the result of any
Boolean operator. So that means it's going to be always giving you the opposite. E.g. if you say left,
it's going to go right. If you say go right,
it's gonna go lift. So here you having always
the opposite other results, it's going to work only
was only one condition. So it's not combining
two conditions like and, and or. So. Here we have the condition a. If you have here true
and you use the nuts. So that means you
will get the fall. So it's going to
do the opposite. And the same. If you have false and you
use NOT operator on it, you will get true. So it's always like
reversing the results. If you have true, you're
going to get false. If you have vaults
going to get true. Okay? So enough with the theory, let's have some tasks in
order to learn that in SQL. So we have the following tasks. Find all customers
who comes from Germany and their score
is less than 400. So we have here two conditions. Let's try to solve that. So as usual, we're
going to use select. No specification
about the columns. Star from our table is customers are now in
the where condition. We have two conditions. The country is Germany, so we can write country
equal the value Germany. Now we have another conditions. It says the score should
be less than 400, score less operator 400. So now I have two conditions and I need to combine them
in the task is safe. And that means both of the conditions
should be fulfilled. I need to right now, the operator and between
both of those conditions. So let's run this and see. With these conditions we
have only one customers, thus fulfill both
of the conditions. So we have Maria
come from Germany. Her score is less than
400. Okay, guys and girls. So now let's see
whether database, once we executed the and
operator, we have as usual, select star from customers
database focused on the customer stable stars
means we need all the columns. So we're going to see all
the columns and the results. Now that database is
going to go through each row and to strive to find out whether it fulfill their requirements to
put it as the results. So let's start with
the first one. The first customer, Maria, she comes from Germany. So this is the first through
the first condition. The second condition,
we have scored 350, it's less than 400. So that means we
have another true. And since we are using ads, both of them are true, we will get the result as row. So that means that
Delta V is going to go and put her other results. So the next one we have John. The country is USA. So this is the first false over here on
the first condition. The second condition as well, it is higher than 400s, so it's going to put it as
well, false, false, false. The and operator gonna
put it as false. The next one, we have the
same situation as well. The country is not Germany and the score is not less than 400, So both of them false. The end of birth are
going to put it as false. And the fourth one
we have Martijn, the country is Germany, so we have the first
through paths. That condition is not less
than 400 z-score, sorry. So we have here false with
the ant, it will not work. So that means it's
going to put false as a result because both
of them are not true. And the last one, both of the
conditions are false. The country is not Germany
and we don't have a score. So that means we
have as well false. So only one customer fulfill both of the
conditions with true. And once you use and you
will get only one record. Okay, so now let's
jump to the next one and we have the OR operator. The task says, find all customers
that come from Germany, or their score is less than 400. So we have almost
the same setup. But here we have the
logical operator or so, we have the same conditions. Country equals Germany
score less than 400. But now we're going to connect
them with the OR operator. So now let's check the results. I'm going to execute that. And as you might
already noticed, we have now to customers as
a result for this setup. So let's check what happened. So now at the start as usual, we tell the database select
star from customers. It is focusing on the customers, all the columns
because of the star. And now we have here
the same conditions. So Country equal to Germany
score is less than 400. But the only difference
that we are using the logical operator or the
results can be different. So that database is
gonna go through each row and see whether it's fulfilled the requirements
or nuts with the, or it is enough to
have only one true, true as a result. So as you can see here
in the first customer, both of them are true. That's means we will have true. As a result. We will see Maria results. After that. Those two customers, they don't have any true
in any condition. That means it's going to
be false in the results. But the customer for Martin, he has one true. So that means this is enough. We will get that as a result. So Martin gonna be the results. The last customer the same. So he has both false. We don't have any true. That means the or
operator gonna put false. So that's why we got to
customers as a result. Alright, so now Two, the last one we have the not operator and we
have the following task. Find all customers whose
score is not less than 400. So that means we have only one condition
and we have the nut. So let's try to solve that. So here we have
only one condition. It is above the score. So it didn't say anything
about the country. I'm can remove this part of it. So we have score
is less than 400, but it says it should
not be less than 400. So all what we can do is just to add them nuts operator.
It's very simple. So let's run this. As you can see over here,
That's all customers, they don't have
score less than 400. Okay, so now let's see
what the database done once we executed
the NOT operator. So as usual, we will get all the columns
because of the star. And then we have the condition
score less than 400. But with the operator
nuts, without the knots, we will have only one customer that's fulfilling
these requirements. So we have only one true or false with another operator is going to reverse everything. That means if you have true, it's going to show it as false. And if you have four, it's
going to show it as true. So it's just gonna
do the opposite. So here we have true and the
result is going to be false. The next three are all false, so we will get through, but you need to be
careful with something. So here it is null. So the database don't
know whether it's less or greater or
something like that. So it will treat it as
unknown and it will not show it other results
because it is empty or null. So that's why we have add
the results, those trues. That means we will have
only three customers. Alright, so that was it for the three operators
and, or nuts. And next we're going
to learn about the logical operator between.
19. SQL | BETWEEN: Alright guys and girls. So now we're going to talk about one more logical
operator that you could use inside the where clause
in order to filter your data. And that is the between. Between is a logical
operator that allows you to select only the rows that
falls within a specific range. In order to work
with between in SQL, you need to define
boundaries to boundaries two values that
specify the range. So here we need to
define in-between the mean value and
the max value. It could be anything like
text, number and date. Here in SQL, any value
between those two boundaries. They aren't gonna be
considered as true. And the values or the
rows that are outside those boundaries gonna
be considered as false. And one more very
important information that those boundaries, the main value and the max value are included
in the condition. So it's really, I see in
the projects a lot of people that forget
about it or true, like ask again, are those boundaries in
the condition or not? So it's really confused a lot. Don't forget those values are
included in the condition. So now in order to
understand that, we're going to have some
task and we're going to try to solve it with SQL. Alright, so now we have
the following tasks. Find all customers who score falls within the
range of 100.500. So let's try to
solve that with SQL. So as usual, select star, there is no specifications
about the columns. Our table is customers. Now we need to filter the data. So we're going to
use where and here, the column that we need
to use a score because it says score should be 100-500. So we're going to
write down score. And now the syntax for between, you need to write
the keyword between. And here now we need to
specify the minimal value. So the Min value, the first boundary is the 100s. And then we're going
to use the operator and then the max
value. And that's it. So for the between, you need to write
down the column name between Min value and max value. So that's it. Let's now try to execute the query
and see the results. As you can see, those two
customers have the scores. That is 100-500. Okay, so now let's see what
the database does once we executed the query with
the between operator. So now as usual, select
star from customers. That means in the
results we need all the columns
and we have where. So that means that
database should filter the results and we have
the condition 100-500. So let's go through
all the customers. So the first one we
have the score 350. It is between this
range 100-500. So we have the first true and we will see it
in their results. So the next one is 900. So it is like outside
of the max boundary. That's makes it as a false. The same goes for George. We have 750 it is as well
outside of the 500s, so it's outside of
the boundaries, not between those two values. We have the false. And now it is interesting, we have the 500, 500. It is not within the range, it is exactly the boundary. And with that in-between, It's going to
consider it as true. So we have it as true. And the last one we have now, so it is unknown, so it will not return it here. That's why the results. We saw two customers,
Maria and Martin, because they fall in
within the range 100-500. And Martin is exactly
the max boundary. That's why it is
considered as we be true. Okay guys, So there
is another way how to solve such tasks
without using between. And instead of that, we can use two conditions and connect
them with the AND operator. So I'm going to show you that star from
customers like usual. And now we're going to
write the where conditions. First, the score should be
greater or equal than 100. So we're going to use operator
greater or equal to 100. And then you're going to write the second part
of the second boundary. The score should be smaller
or equal to five hundreds. So we're going to
use this operator less than or equal
to five hundreds. So with that, we redefine
the between function. And if I run this, I'm going to remove this
part over here and executed. We will get exactly
the same results because we just redefine
it in another way. Some developers like me tend not to use between and
instead of that, we use such conditions because
for me it's more easier to read what the query is doing
instead of using between, because I need to remember
when I used between that, e.g. the boundaries are included. And if you forget that, you need to search about that. So it's really easier just to read exactly what
the query is doing. So I tend to avoid between
the two conditions with ants. And one more
advantage about that. You couldn't control it better. So e.g. I. Could use for the boundary with the magnets value only
less without the equals. So you could define it more
flexible than the between. Alright, so that was
it for the operator between next week and learn
about the in operator.
20. SQL | IN: Alright guys and girls. So now we're going to talk about one more logical
operator that you could use inside the where clause
in order to filter your data. And that is the in
operator, the enumerators. It allows you to define a
list of values that you would like to see at the results or to be included
at the results. So how it can work, as I said, you can define like
just check list, a list of values where
you are telling SQL only those values are
allowed at the results. So here you can define
multiple values. It's not like the between where you define the boundaries. Here is a list of values. So the database can
start like asking for each value is a value
inside this list. If the answer is yes, then it's going to be true. If the answer's no, it's gonna be simply false. Alright, so now as usual, in order to understand that, we're going to have one
task and try to solve it at SQL, the task says, find customers whose
customer ID equal to one of the values 12 or five. So let's try to solve that. As usual, there is no
specification about the columns, so you're gonna select
star from customers. And now we need to
filter the data. So we're going to use
whereClause and here we starts. So it says the customer ID. So that means this is the column that we're
going to use in order to filter the
data from our ID. And now we have a
set of values, 12.5. So in order to use that, we're going to use
the in operator. And we start defining now
the list, a checklist. So open brackets. The first value is one, then comma two, comma
five, then close brackets. So we defined the list of values that we want
to see the results. And with that, we're gonna run that query and see
what's going to happen. As you can see, the
query is run and we have the list of customers that
exactly match our list, the customer ID 12.5. Okay, so now let's see
what the database done once we executed
the in operator. So as usual, select star from customers means I want to see all the columns at the results and the
database can select that. And since we have where clause, it's going to start
checking the condition. The condition should say is customer ID should
be in this list. So the data is going to
check each customer. So here we have customer ID one, and it is in the list. That's why we're going
to get a true over here for this condition
and we're going to see it at the results. The next one is two. So here as well we have true or this one and we're
going to get it at the result. The third customers customer ID equal to three and it
is not in the list. That's why we're going
to get false over here. The same for four, so four is not in the list. It will ignore it.
And the last one, customer ID equal to five
and it is in the list. So we will get a true for that. And this is how the database
can process our query. Alright, so you
might tell me now, wait a minute, Vera, I just learned about
the or operator and how I combine different
conditions using the OR. And I could solve
this task using that instead of using in
and like a checklist. So let's see how
we could do that. I agree it's going
to work as well. So select star from customers, where customer ID equal to one. So the first one, then we write or customer ID equal
to two and go on. Customer ID equal to five. So if I run this query, we will get exactly
the same results. But like I agree on that, but as you can see here, it is more compact and
much easier to read, like you make list
and that's it. So here you can define
all those values with multiple conditions and
connect them with the OR. So. Imagine you have ten values, you will have here
ten rolls of codes. So I really liked it
with the n operator. It is more compact
and easier to read. Alright, so that's all
about the in operator. Next, we're going to learn
a very important operator. It is the light.
21. SQL | LIKE: Alright guys and girls. So now we have the
final logical operator that you could use inside the where clause in
order to filter your data. And that is the like operator. It is little bit more
complicated than the others. Don't worry about it. I'm going to explain that
step-by-step with examples. So once you understand it, it's gonna be more
easy and fun to use. So in the other examples
with the whereClause, we always define
the whole value of the complete value
in the where clause. But sometimes you might be in situations where you still
don't know the values. You are searching
for some values and you have a bathroom
at your head, e.g. you are searching for customers where their name starts with m. So here you don't
know the whole value. You are searching for something
and you have a pattern. You could use the lag
operators who was a button in order to find
those customers. Or there is a lot of values
at the database or SQL, where it's going to be
almost impossible to define all of those values
and the where clause. So instead of that, you're going to define
like a button and you tell SQL am searching for
something like this. So now the like works like this. It returns true if the
value matches the pattern. Otherwise it's going
to return false. So that means we need to build
up like butter on, in SQL. And in the scale we have two
tools in order to do that. We have the percent where we
say it's matches anything, or we have the underscore, it matches exactly
one character. So now let's have an example in order to understand that we have the first example file
names that begins with M. That means, you know that
the names begins with M and you don't care about
the other characters. So now we need to build
up such a button. We can write down the M and the percentage you are
saying here for scale, that begins with M and the
others, it doesn't matter. It could be empty. It could be like characters. Multiple characters doesn't
matter, but for you, it's very important
that they start with m. Now we have another one. It says find names
that ends with n. So that's means it
could start with anything. So we're going to start
with the percentage, and it should end with the end. Here. You need to be
careful that they are case-sensitive over here. So there is difference
between small n and begin. So this patterns tells
SQL starts with anything, but I need it to
be ended with n. Now we have the
example where you say, Okay, it should not be
the first or the last. The name should contain
somewhere that our character. So find names containing the r. So you are not
defining whether they are at the start or at the end. So with that, you could
use the following pattern. It could be started
with anything than R and end up with anything. Here. You don't know
where exactly they are. The names should contain
somewhere and our character. Now, the next one you could be more specific where
you can say, okay, find me the names that's
containing the r, but exactly at the
third position. So it's little bit
more complicated. And with that you're going
to use the underscore. Underscore you say, okay, the first position
could be anything. The second position
could be anything. But the third should
be exactly the R. And afterwards it's going to be anything like empty
characters and so on. So with that, you are mixing those two tools,
underscore and percent. So now we're going to
go more in details and words examples in order
to understand how x. Okay, So now we're gonna
go and deep dive in each of those examples
and explain for you what is going on in the database or SQL once you define
those patterns. So the first example we have find names That's begins with M. Our pattern is M and percentage, that means anything after that,
we don't care about that. It should start with M. And in our database we
have those five values, those five names, and
let's start one-by-one. So Maria, It starts with M. So that's means it is
matching our pattern. So SQL going to return
for that a true. The next one we have John. So the J over here is not
matching our pattern. That means SQL going to put
false on it than George, the same thing, it starts with G and not matching our pattern. It should start with em. To get like a true. We have false for that. Martin here starts
with M. That means is matching our button and we're gonna get
for that to true. And the last one, Peter, we have p and it is not matching or pattern and
we're going to cut to false. So if you define those
pattern in the SQL, you will get those true and
false from the database. Okay, so in the next
example we have find names ends with n, small n. Our pattern is like anything, the percentage and then small n. Let's go through the names. The first one, Maria and the database is going
to check the last one. Okay? The last one
is a not matching. Our n is going to reject it. You're going to get false. So we have John, john has the last character
and it is matching. Our pattern database is
going to put through on it. So the second one
we have George, George end up with g. It is not matching
the pattern false, Martin n, we have true here. So the last character
matches our button. And better, we have
the r over here. It is not matching the pattern. So if you run Sanjay
button on your database, you will get only John
and Martin as a result. So let's find the next one. The next one says find names
and containing R and we didn't specify anything or
that somewhere should be R. So the button it says
present, are present. That means somewhere
there is an R. So with the Martin,
somewhere there is r. So here, over here we have the R and it's going
to return true. With John, there
is nowhere and are like There is no character
over here with the R. That means the database
is going to return false. George, we have
over here an hour, so it's going to return true. Martin, the same and
better as well, the same. So as you can see, if you like, start with the present and
end with the percents. The database can find somewhere your character and it's going to return it as true
as you see here, peter ends with R, Martin in the middle
somewhere there is r. So here you don't care
about the position. Where is your character? Okay, so now we come
to the final one. It says find names containing the R and
the third position. Here we are very specific. We are saying exactly the
third should be the R. So in order to do that, we will not use the
percent in our button. We're going to use
the underscore. It says the first character
could be anything. The second character could
be as well anything. But the third
character should be exactly the r. And after that, it could be anything,
it could be empty like bunch of characters. We don't care about that.
So let's go through our values and see how the
database is going to react. So Maria, It starts
with M, It's okay. It's okay. The third should be R and we
have here match afterwards, like it doesn't matter. So this is a matching
to our patterns. So Maria gonna get a
true from the database. The next one, John, like the first two
characters are okay, but the third one is not
matching the pattern. It is the H. That's why we're gonna
get a false for that. The third one, you can
see the third position is 0 as will not
matching. Our button. Martin is matching
because we have, the first character is
M, could be anything. The second one as well, a, and the third is R. So this is matching our pattern. The risks could be anything. So that's why Martin is
matching exactly our button. The last one, Beta, doesn't match our
button because at the third position
we have that T. With that, if you
run such a button at your database and you
are specific with that, you will get only Maria
and Martin as a result. So now as a next we're
going to go deep dive in examples were okay, so now as an x, we're going to learn how to
write SQL statements using the like operator in
order to understand the syntax and to solve
those four tasks, we're going to start
with the first one. Find all customers whose
first name starts with M. So as usual, we're
going to select star. No specification of what the columns are
table is customers. And now we have to filter
the data with our buttons. So where clouds, the
columns that we're going to use in our button
is the first name. Then we're going to write
down the like keyword. After that, we're going to
specify now the button. So it starts with
the high comma, then big M percent, and then close it with
the high command. So with that, we specify the pattern for the like
operator and let's run it. So as you can see
in the results, we got those two customers
that have a big M, the start of the firstName. So this is how we gonna do
it using the like operator. So the next one, it says, find all customers whose first
names ends with a small n. So we're going to have
the same stuff over here, but we need to redefine
the pattern of comma, high command that wasn't German. And then anything
like the present, and then small n, then closet. So let's run that. And as you can see, we got those two customers,
join and Martin, because they have
their first name and they end up with, alright, so now to the third task, it says find customers whose first names containing
somewhere and r small r. So let's do that. So we're going to have
the same setup over here, but we need to
change the pattern. So high comma, then percent, small, percent, then
high come up with that. As I said, you are
not specifying any position somewhere
should be an R. So let's run that
and check our query. You can see here Maria
has an R somewhere. George has an awesome
where Martin and Beta. So we got those four customers. But John, we didn't get him because he didn't have an
art in his first name. Okay, so now to the last
one's the task says, find all customers
where the first names containing the character
and the third position. So here, the same
stuff over here. We need to change
only the pattern. Too high comma, the first
character should be anything. So underscore. Again, underscore the second
character could be anything. And here we define the r. And then we say anything after that. Then high comma, it's own. Once we write down here, the button up there,
they're like, and let's run that. And as you can see, only Maria and Martin, as we discussed that containing
the third character, the r. So with that you have those four examples
with the like operator. It's really fun once you start
like practicing with that. So try now, I would say
to make some pattern at your head and try to write it down and see how SQL
January egg that. Only with the practice, you're gonna get
some good results and you're going to
like, understand it. Alright, so that's
all for this chapter. We have learned how to
filter our data using the where clause and many
important operators. In the next chapter, we're going to step up the level we're reading to
learn how to combine our SQL tables using
joins and union.
22. SQL | JOINS Concept: Alright, guys and girls, so, so far we have learned how
to query only one table. In all our examples, we focused on the table
customers we've done select where we filter
the data and so on. That was only one table. In a real-life scenarios, you will be working
with a real database that's contains a lot
of different tables. And once you start
writing SQL statements, you will end up querying
that only one table, but maybe multiple
table in order to get something
meaningful of the data. So that means you need to start learning how to combine
different tables, how to join those tables
together in one SQL statements. This is very important
in order to learn SQL, because once you master this, you will be good in SQL. Now in our tutorial database, we will be working now
with two tables that customers and the
orders in the order, as you can see, which customer did the blades, which order? So now, in order to
join those two tables, you have to specify two things. First, you need to
determine what is the join key, adjoin key. It is like a column that exists in both of
the tables, e.g. the customer ID, we
can see it here in the customers and as
well in the orders. So that means the customer ID is good candidate in order
to join those tables, and it's going to
be our join key. The second thing
that you need to specify is the type of the join. In SQL, we have four
different types of joins. We have the inner join, left join, right
join, and full join. It might be complicated
at the styles, but don't worry about it. I'm going to explain all of those types step-by-step
with examples. I'm going to show
you as well how SQL works with those types. Alright, so now let's start with the first type of joints
we have the inner join. The inner join is the most commonly used
type of joints between develop bird's eye
as well tend to use a lot of inner joins
in my SQL statements. So it is widely spreads to
use inner joins in SQL, there is very important
aspect that you need to understand once you work
with This girl joints. And that is in SQL, there is always a left
table and the right table. And that's really the band how you are writing the scripts. We will see that
in the examples. In the SQL Joins, there is the left table, we have the customers
and the right table, it is the orders and
the inner joint. It doesn't matter
because in the results, once you are using inner join, only the matching roads will
be presented at the results. So if you use inner join, you will exclude all those
results that are not matching. And you will see as a result, only the matching rows
between those two tables. Now to the second
type of joints, we have the left join. As the name says, it is a left join. That means we are the bending on the left table more than
to the right table. So once you are specifying the left join in
your SQL scripts, you are telling the database or SQL that I want everything, all the rows from the left table and
from the right table, only the matching rules. So once you are saying okay, left join, that's been, you will find all the
records from left and only the matching
grows from the right side. So let's go to the next one. We have the right join
is exactly the opposite. So you are saying here
in your SQL script, right join, you are the bending completely on the right table. So that means once you
write that script, the SQL will present all the records from the
right table and the results. And from the left table
only the matching records, only the matching rows. So it's really the way
opposite as the left join. Then we have the
lifestyle of joints. We have the full join. Once you say in your scripts, I want to have full join. That means you want everything
from both of the tables. That means from the left table, gonna retain all the rows. From the right table you will
get as well, all the rows. So what's full joined as the
name says, it is everything. Alright, so with that we have an overview about the joints. And now before we start talking about the first time
the inner join, we will learn quickly
about the SQL aliases. It's like hidden tutorial, not on the roadmap, but we have to learn
that before we start writing SQL Joins.
23. SQL | AS Statement - Aliases: Okay, so now before we stop having some examples in order to understand and learn how
to join tables using SQL, we have to learn very
important things in SQL and that is SQL aliases. You need to learn that
once you start querying multiple tables in one SQL
statements, Let's take this. If I want only to select the
customer ID from customers, this should not be a problem. So if I execute this, I will get all the customer IDs. But once I specify multiple
tables in one query, you need to tell the database which customer ID
in which table, because as you'll
see in our example, we have the customer
IDs and two tables in customers and orders. And if you leave it like this, you will get an error where the database is
going to tell you. I don't really understand. Which column do you mean? Do you mean the column from
customers or from orders? That's why we need to specify one more thing near
the column name, and that is the table name. So we're at customers,
dots customer ID. And with that, you
are telling database, I want the customer ID
from the customers. So if I execute this, I will get as well
as same result. There is no problem here,
but you need to specify that once you are working
with multiple tables. But the annoying thing here, if you just always like write
the table name over here, it's gonna be really
annoying to write. That's why we're going
to work with aliases. So we're gonna give
the tables like a nickname and we call
it in SQL aliases. Okay, so now in order
to do that in SQL, we're going to go just
beside the table name, and we're going to write
down the keyword as, then give that alias
name or nickname. I'm going to use the C
Instead of customers. And now the database
understand, okay, in my script is using C Instead of customers
so I can go everywhere. And instead of using
the customers, I could say C. So if
I ran the result, I will get exactly the same
thing. There is no error. But now as you can see, it is much easier to
handle my script. I'm going to just write see dots customer ID instead of the customers dots customer's ID. So it's really easier
way to handle stuff, and I always tend to do that. So I really recommend to use aliases in order to have
like small scripts, you could do as well the
same for the columns. So e.g. we have here
the customer ID. I could go and rename that. And to do that, It's
the same stuff. I go right beside
it, I write as. So instead of
having customer ID, I'm going to write like CID. So let's run this. And as you see it's
grill understood that. And he is printing out
the result as well, CID to Hey, I understand. I'm renaming this column
in my result as CID. There is a very important
aspect here to understand is that It's going to rename that only in my script
and in the results. That database will not go to the tables and going to rename the tables are gonna rename the columns that is
different query to do that. So this command, the ads, it is only temporarily at
my script and the results, so nothing is changing in the data model or
in the database. It's going to stay the table, customers and the column, they're going to stay
the customer ID. This is only a tool to help
you once you are writing SQL statements and
as well to help you rename stuff very fast, to have it as a result. Alright, so now we have
everything to start with the first type of
joints that inner join.
24. SQL | INNER JOIN: Okay, so now let's start with the task in order to understand how to write SQL statements
to join two tables, we're going to start
with the first task. It says, find all customer ID, first name, order ID, the order quantity, excluding those customers who
didn't place any orders. So in this example, as you see, it is not only the customers, we need some columns from the customers table and some columns from
the order tables, and we have to join them
in order to do that. Let's start doing that
step-by-step using SQL. So first we're
going to start with the select sense in the task. It is like specifying
the columns. We will not use
the star selects. We need the customer ID, then the firstName, and the
order ID, and quantity. So now we need to
specify the tables. We're going to start
from the customers with the inner join here. It doesn't matter
whether you are starting from left
or from right. So I'm going to start
from the customers. Now, in order to specify
the second table, we're going to use
the join statements. So we're going to
say inner join. And with that, I'm saying, okay, we're going to join now the
customers with another table. So we're going to inner
join that orders. With that you are
connecting two tables, the customers and orders. As I said, you need to
specify two things. The join type and the join key. We have already here specify
the inner join because we don't need those customers that didn't place any orders. So we're going to use the
inner join over here. And the second thing that
you need to specify here, what is the join key? How are you going to
connect those tables? You need to specify that for
SQL in order to do that. So we're going to now go
to the new line and say on the joining on those columns. So in order to
specify the columns, I'm going to give now
only some aliases. So instead of customers, I'm gonna say, okay, I'm going to call you see? And instead of orders, I'm going to call u as 0. So now in order to
join those tables, we need to find out
what is our joint key. Which column here exists in both of the tables so we can
see the customer ID, we can find it in the
customers and in the orders. And it is the perfect column
to join those tables. So we're going to connect
both of them with their own. So I'm gonna say, okay, let's take the customer
ID from customers. It should equal to the
customer ID in the orders. So all dots, customer. With that, I specify
the rule or the key, how the table is
going to be joined. I said the customer ID from the left table should be exactly the customer ID
from the right table, from customers and orders. And with that, I
specify the rule I specified over here as
well, the join type. And with that, we
connected two tables. Alright, so now before I
go and run this query, we still have one problem. And if the customer
ID in the select, I didn't specify
from which table. And if I run it like this,
we will get an error. You could try it. But now we need to specify
which customer ID I want. Is it from the customers
or from the order? So in order to do that, we're going to use the C dots, the table name or the
alias in order to specify, okay, I want the customer
ID from the customers. For the rest. You don't need to do that because it is unique name
like the first-name, its unique column name only
on their customers by two, I really recommend
you once you are trying like to join some tables, it is very nice way to
document your staff to say, okay, the first time it
is from the customers. Because with the time you
could forget that or if you don't understand or don't
know the data model, it will be hard to
understand whether this firstName and the
customers are on the orders. So it's really nice
way to document that. If you put just the table name or the alias address
starts with that, you could see very quickly those two columns
come from the orders and those two columns
come down the customers. And one more thing to
make, it looks nicer. I'm just going to use tab. So now we are ready. I think let's try to query that. So as you can see
now in the results, we got the columns from
both of the tables. So we have the customer ID, the first name from customers, the order ID, and the
quantity from the orders. Okay, so now let's understand
what that database was doing once we executed
the inner join. First, I'm going
to select, Okay, Which tables do we
need in the script we have the from customers, so it's going to read
the table customers and then they have the
join table orders. So that means that
database is going to focus on both of the tables. Then it's going to define
a clear which table is left and which
table is right. Since we have first the
customers in the front, It's going to consider
the customer tables as the Lift Table. And then since we have the
orders in a joint as the next, it can consider it
as a right table. This is very important
to do the joints, but since we are
using the inner join, it doesn't really matter for us whether we use first customers are orders in the database is
going to follow the script. Okay, So now as a next step, the database is going to check
which column do we need. In our SQL scripts statements, we said we need only
the customer ID FirstName from
customers, from orders. We need the order
ID and quantity. Alright, so now as a next step, the data is going
to check up here which roads should be
presented at their results. And here is like the most important
thing we are using now, the inner joins, that
means that database should present only the
record that is matching. So in order to do the match, it needs like the key
column for the joints. So we specified and said, Okay, you need to check
the customer ID between those two tables. So let's go through that. The first customer ID one, we have it at the customers
and as well we have it as a records in the orders. So that means there
is a match between those two tables and this
customer will be presented. So here we will get
the customer ID one, firstName Maria, and
her order was 1001. And we have this quantity. So here we have the
whole record of Maria from both of the tables. We go now to the next one. We have John john
present as well as the customer id2
in the table orders. So there is a match
and it will be presented as well
in their results. And his order is 1002, and he has this quantity. So it's going to proceed
in the third customer. The third customer
exists in both of the tables in
customers and orders. And it will be as well
listed in the results. And his order ID,
this quantity 500s. But now we comes to
the Customer ID for. The customer ID
for exists only in the customers and we don't
find it in the orders. That's why there is no match. And the database
is going to ignore this customer and it's
going to proceed as well. Over here. It's going to check, okay. We have the customer ID five. It is only as well exists in the customers and
not in the orders. There is no match. We have one more
thing that we have customer ID number
six over here. We have it only on the orders, but we don't have it
in your customer. So there is no match
with the inner join only if the customer or the key
exists in both of the tables, it going to be
presented as a result. Alright, so that's all
for the inner join. Alright, so that's all
for the Inner Join. Next, we're going to
talk about a left join.
25. SQL | LEFT JOIN: Okay, so now let's go to the next task and we
have the following. Find all customer ID,
FirstName, order ID, quantity, but include
those customers who didn't place any orders. For us. That means we need to see, as a result, all the customers, not only those customers
that did place an order, but all the customers. In order to do that, we're going to use
the left join. So we're going to have
exactly the same query. There's nothing has changed the same columns,
the same tables. But instead of
saying inner join, we going to work with a left
join and saying left join. That means okay, for the SQL, it can list all the customers. So let's see what can
happen if we do that. Let me make it
bigger a little bit. So as you can see here, as I said, left join, we have all the information
from the customers and only the magic
ones from the orders. Alright guys, again, let's understand what the
database was doing. Once we executed the left join, the database is going to
focus on the customers and the orders that database
understand, Okay, Customers is the left table because it's comes
as first with the from the orders is
the right table because it's comes in the
left, join in the query. As a second, after that, I'm going to specify
the columns. Again, we have the customer ID, FirstName, order
ID, and a quantity. And so now it's
going to start doing the matching and going to check, okay, which joint
type, what do we have? We have the lift. So since we'd say, okay, it is a left join, the database is
going to say, Okay, I need everything from the left table without
doing any matches, so we need everything. So it's going to
list all the IDs and as well all the names,
results, checking anything. But from the right side we need only the matching records. So it's going to really
check each one of them. So here, customer ID it
exists and the customers, so it's going to take it
and put it as a result. So now for customer id2, we have as well one, it's going to put
it at the results that customer IDs three
there is matching. But now for Martin, he don't have any orders. So the database is
going to show nulls. Instead of that. Now, it means like an empty, there is no value
found or unknown. And for better as well, there is no customer ID
with the number five. That means there is
nothing at the right side. We will have as well. And if t, So this
is how it looks. Once you execute the left join, you will get everything from the left and only the
matching from the right. If there's anything missing,
I'm going to put nulls. Alright, so that's all
for the left join. Next we're going to start
talking about the right join. It is very similar
to the left join.
26. SQL | RIGHT JOIN: Okay, so now let's
jump to the next one. We're going to talk
about the right join. We have the following task. It's almost the same. Find all customer ID, FirstName, order ID quantity, but this time include all orders regardless of whether there
is a matching customers. That means for us, okay, we need all the orders from the right table,
from the orders. And in order to do that, we have the same setup
over here and it's krill. We just need to change the type of joints so we
can write here, right? Once you do that, you
are controlling how the database is going to match and going to present
the results. We will have the same
setup over here, will not change anything. And let's run this. And with that, you can
see the database did list all the orders from
the order table and from the left side only
the matching customers. Okay, So as usual, let's see
what the database did once we executed the right join.
We have the same setup. Customers is the left table, orders is the right table, and we have the same
column as well. So a customer ID, FirstName, order ID, and as well
we have the quantity. But now here the
difference is that we say it is a right join. So in order to do that in SQL, it's going to like presents
all the results from the right table without checking whether there is
a match with the left. So the data is going to
select everything from here. So all the orders and all the quantities without checking anything
from the left side. Now from the left side, it's going to only
present what is matching. So it's going to check. Okay. Do we have customer ID one? Yeah, we have it
so it can present their results over
here on the left side. Do we have customer two? We have it as well. Customer three. We have George over here. But now we don't have
a customer number six, that means it's
gonna be null again, so it's gonna be empty. We don't have a customer with the idee fixe in
the customer table, though that we presented
everything all the orders from the right sides and only the matching informations
from the customer. Alright, everyone is, so that's
all for the right joint. Next we're going
to start talking about the last type of joins, the full outer join.
27. SQL | FULL JOIN: Alright, let's move
to the last one. We have the full join and
we have the following task. List, customer ID, FirstName,
order ID, quantity. But this time
include everything, all orders and all customers. Okay. With the full joint, I have two things to say. First is that the full joint is only supported
in some databases like Microsoft SQL
or MySQL or Oracle. You could not use
the full joint. But instead of that, I'm going to show
you some work around how to do full join with MySQL. So don't worry about it. But we need to twist some stuff in order to
create the full joint. If you are using Microsoft SQL, you can just go
and say full join. The second thing, that full join has sometimes bad performance
if you have big tables. So try to avoid using the
full join in my projects, I always tend to use
like inner join, left join, right join, all full outer joins. I really tried to avoid using that full joint has
really bad performance. So if you have small tables, it should not be a problem. But once the table gets big, the full joint is
going to be really slow because you
are saying, okay, give me everything from left givers,
everything from right. And that has sometimes
bad performance. So try to avoid that. So now the question,
how are we going to do full join if we don't have in my SQL if full keyword
in order to do that. So as I said, we're going
to use some workaround. So following this,
so a full join is actually is a combination
between a left and right, left join, right join. So what I'm gonna do, I'm just going to go and
duplicate this scripts. So we have twice the same query, but when we say left join and the other
we say right join. As the next tutorial, we're
going to talk about how to combine two statements in one. In order to do that, we
will use the keyword union. Once I put union, I'm just like adding
two statements in one. So here I'm saying, Okay, give me all the results from the left and combine it with
the result from right. And if you execute it, you will get exactly the same
result as the full join. With that you could see, okay, here I have all the
customers as well. I have all the orders, so we have here a full join. Alright guys, so now
let's see whether that is done once we executed
the flu joint or the scripts that I showed
you is left, union right. We have the same setup
customers orders, and we have those four columns. So since it's full join, that means all the records from left and all the
recall from right. So it's going to
start from the left. We will have all the customers
and all the first names. And then it's going
to start matching on the right side, some area, it has this order,
this quantity, customer ID has this
order, this quantity. The three, we have this
ID and this quantity. But for Martin and better, we don't have any
orders from them. So we're going to see nulls
over here, over here. But there is still
something missing that we don't have all
the orders over here. That's why the database
is going to go and present this order ID. And this quantity that's going
to match on the left side, it says, okay, there is no
customers on the left sides. And it's going to put
over here some nulls. So with that, you got all the customers and you have all the orders that
is matching for them. And the way around
with that you have all orders and old customers
using the full joint. Alright guys, so with that, we have learned all
different types of joins. Next we're going to talk
about a similar concepts. It is the union and union.
28. SQL | UNION: Alright, so now we're
going to learn how to combine tables using union. Union is very important
tools and SQL in order to combine tables
and very powerful. So previously we have
learned how to combine tables using the join methods. So what we are doing
enjoins we have two tables, customers and orders, and we are joining the
columns together. So at the results, we're gonna get one big table, one table with all the columns from left and from the right. But with union, we are as well
like combining two tables. But instead of combining
the columns here, we're going to combine
the rows together. So here we're going to
get very long table, including all the rows from
the left and from the right, but having the same column. So we will not get all
columns from left and right. Instead of that, we
will get all the rows from left and all the
rows from rights. Okay, so now in order to
understand the union, we're going to have
the following example. So in our tutorial database, we have two tables. We have the table customers, and we have the table employees. So now we have the
following tasks. Make a list of all
persons from customers and from employees where
we have the FirstName, LastName, and the country. So that's means it
doesn't matter whether the person is a
customer or employee. We're going to have make
a list with everything. So in order to solve this task, so we're going to use
the union operator between two tables,
customers and employees. So if we take this closely, you will find though three informations in
both of the tables. So we have firstName
and customers. We have as well the
same in employees, LastName and customers
last name employees. And we have the country and employees and the
same ads, customers. This is very important
that we have the matching columns
from both of them. So the database, if we start the union
between both of them, the database can
select the columns only from the left table salt. We will have FirstName,
LastName and country. And we will not have here again the same columns
from the right one. It's not joined, it is a union. So the left one going to decide what are
the column names. So this is very important. So the database is gonna go
and select everything from the left table and put
it at the results. I'm going to do the same
for the right ones so that the employees and select all the records and
put it over here. And with that, we have a full
list of all persons from customers and as well from
employees in one results. This is very important
that both of the tables at the SQL
query should have exactly the same number of columns and as
were the same order. So if we are doing like
in the employee's first, the last name, then
the firstName. In the results. We will
get that switch as well. So be careful with the
order of columns and the number of columns should be matching between left and right. One more thing is very important that there is two
types of union. Time. Number one,
that is the union all where we're gonna get the
result exactly like this. So that's means if there is any duplicates between the
table one and table two, those WE gets going to stay
at their results so there is no check the uniqueness
of the results. If there is any like
person on the left, I'm the same person
or the rides. Nothing going to happen. We will get the whole results. But if you wish to
remove those duplicates. So if you check the
results over here, you can see John. He is customer and
at the same time, he is as well employee. So this could happen. Yeah. So in order to remove
such like doubly kits, we could use the
other type of union, and that is only the
union without union. All. I'm going to show you that once we are writing
the SQL statements. So this is as well very important to understand
that the union, if you want to have the duplicates like exactly like the data inside the tables, then you should use union all. If you want to remove the
duplicates, then use union. So now let's see how we're
gonna do that in SQL. So this is really
easy to do in SQL. All what we're gonna do
is we're gonna write two queries, one for customers, one for employees,
and then just put union between them and we're
going to get the results. So let's try building
the first one. Select first name, last name, and we need the country
from customers. So this is the first query. Let's just execute that and see, okay, now I have a list
from the customers. And then we're
going to write that again for the employees. So select employees we have as well firstName,
lastName, and amp. Country from Blow is. So let's run the query and see. Now we have the list
from employees. So as you can see, we have now two queries, one for customers
and one employees. In order to do the union, like maintain all the
duplicates as well. We're going to write the
keyword between them, union. All. So now we're going to run the whole thing and let's check. So with that we got all
the FirstName, LastName, country from both of the tables from
customers and employees. And as you can see, this list contains
WE kids because e.g. John is in customer as
oil in the employees. So if we wish to remove
such a duplicates between customers and
employees or other results, we just remove the
oil from here. We just use the union. So let's run that again. So now we're going to
get a unique list of information so John can
only happen once over here. So this is how we're
gonna do it in union. One more thing is about how
to control the column names. So as you can see, the
FirstName, LastName, country, this comes from the query above. So this query over here, it's going to control
the naming of our table. So if you wish to have like
different column name, so don't change it over here because
nothing could happen. Database going to
just ignore it. So here we're going
to control the name. So if I wish to add e.g. let's say person, first-name. Here, person, lastName. And hear Harrison country. And we rerun the query. As you can see, we have
the names over here. And if you change
anything over here, the query below, nothing's
going to happen. So let's have first name. So let's run the query. You see nothing going to happen. So now let's test a
few things over here. So if I just make your problem where I'm
going to have first, we have the last name and
then comes the first thing. It is the opposite
as the first query. So let's run this. As you can see,
the database will not notice that we have here mistake or we have problem where we have above
the FirstName, LastName, and then here we have last name, then first name. Because the database
doesn't care about that. It only cares that's both
have the same datatype. Like since we have here var character and here
we have var character, it could present their results. For the database.
It doesn't care about like whether you are
doing it rightly or not. The column name, don't
say anything for it. So that's why Be careful about
the order of the columns. When you are doing the
union between two tables. Now, if we go and try another data type,
e.g. customer ID. Customer ID is integer, and the first name over
here is var character. So if I run the query, we will get an error because I think it's hidden
over here because there is mismatching between the datatype that
database cannot lie combine strings and then after that we're
going to have integer. That's why the data type
is very important for SQL. So let me just repair
everything and run. Now it's works because
the data type is same. So let's try some other errors. I'm just making things broken. So above we have three columns. We have FirstName,
LastName country, and we have here the same. So if I have like different number of columns
between the two tables, let's say have salary. So now we have four columns in one squeal and the
other we have three. If I run this query, we will get as gain and error because it's going
to say you have different number
of columns between those queries and we
can not do the union. That's why that data
type is very important. The number of columns is
very important and as well, the order of the columns
should be matching. All right everyone, so with
that we have covered the SQL joins and now you know how to combine SQL tables together. And in the next
chapter we will learn many important functions and we will start with the
aggregation functions.
29. SQL | Aggregate Functions: Alright, so, so far we
have learned how to get, how to retrieve our data out
of our database and tables. But in real life scenarios, we will be doing a
lot of calculations, aggregations on top of the data in order to get
something meaningful of it, in order to get some useful
information of the data. So in SQL projects, we tend to use a lot of aggregations in order
to understand the data. Because we have
in the data model sometimes like big tables and
just reading the raw data, we will not get any
useful information of it. So we have to do some
aggregations on top of it in order to
understand the data. So that means understanding the SQL aggregate functions are very important and very
essential in learning SQL. In order to get some
information out of the data. In SQL, we have the following
aggregate functions. They are really easy. So if you just read the function name, you will understand
what SQL gonna do once you execute
those functions. So the counts, it can return the number
of rows in a table. So I'm going to
summarize the values. We have the average, we have max-min to return the maximum value and
the minimum value. I will go through all of them, explain that step-by-step
with examples as usual. But here it is very
important to understand how each functions can
deal with the nulls, those empty fields that
we don't have a value because each function's going to deal with the
nulls differently. Alright, so now let's start with the first function we have. It is the accounts. It is as well the
easiest one we, that we have in the
aggregation functions. In many situations once
you are working like, let's say new projects, you have a lot of tables. The third thing that I
tend to use it to see, okay, how many
customers do we have? How many orders, how many, Let's say employees, we
have the band on the table. So I usually always
check that to see how many records do we
have in each table? Is it like Big Table? Is it small table? So if we have the
following task that says, find the total number of
customers in the database. Okay, so let's solve
that using a scale. First, I want to get like all the data from
the table customers, we usually do that using
select star from customers. So that is easy. Now we can see, okay, we have five customers
at the table. But the task is says, find the total
number of customers. That means I want
to see as a result, only the number five, the total number of customers. In order to do that,
we're going to use the function count. So after the select, I'm going to type here
the key word count, open brackets and
close brackets. And inside the account
you could specify either star or the
name of that column. So let's see what the
star and execute that. And as you can see now, we got like five as the row numbers of
customers in the table. So here we have now counted how many
customers do we have. But as you can see here, the name of the column, I don't really like it. It's like the function name. So let's rename it for the
results as total customers. So let's re-execute that. And now it looks better. So the total number
of customers, we have it as five. As I said, we could use here
like star or a column name. So this is the easiest way to do a count on the
table using the star. But if you now include
the column name, is gonna be a little bit more tricky because of the nulls. So let's see what's going
to happen if I type over here customer ID
and run the query, we will get the same
information, like five. But if I bought over here not the customer ID, but the score. And you will see
we have now four. So here we have four scores. We don't have like
five customers. So what happened over here? So now let me explain you
what a database is doing. Once you say count star
or count a column. If you say count star, you are not specifying
any column. That database is going to go
to the table and going to just count how many rows
we have in the table. So that data is going
to count 1,234.5. We have five rows in the
table and add the results, you will get five. But if you say
okay counts score, if you put the score
inside the counts, the database is going to count how many values do we
have inside the score? It's going to ignore the nulls. And here is the problem, or like let's say
the tricky part. So if the database
is going to count, how many scores do we have, It's going to count only four. So in order to count, how many customers do we have? Either you're going
to say, okay, count star or you're
going to like count how many customer
IDs do we have, and you will get the same
results, you will get five. But if you are counting like a column that
contains nodes, here, you will have fewer
records in the results, like the score, we have
only four with the Id, we have like five. Okay, so now let's
move to the next one. We have the sum. Unlike the count,
the sum works only on the columns that
contain numbers, e.g. you could do the sum
on the customer ID because we have numbers
inside it on the score, on the quantity
on the order IDs, but you cannot sum the firstName or some the last
names with the count. You could do that on any type of columns
like you could do, count, firstName, count
countries and so on. So the sum, you deal
only with numbers. And one more thing, if you have nulls, the sum gonna deal
with it as a zero. So it will not ignore it. It's going to deal
with that as a zero. Let's have the following task. Find the total quantity
of all orders. So that means we're going to
focus on the table orders and we're going to summarize all the quantities
of all orders. It's really easy. Let's do that. So first of all, I would like always to start
with the star from orders. And let's run this. So now I have here the table
orders and we're going to focus on the quantity and
we have to summarize it. So in order to do that, we're going to use the
keyword some open brackets. And now type here quantity
close brackets and run this. So with that, you got the total number or the
total of the quantity. We summarized all the
rows in one cell. Here. As usual, we have
this ugly name over here. So we're going to rename
it some quantity. Run it again. So now we have better
name at the results. So the sum of the quantity
we have here, 2650. Okay, so now let's
move to the next one. We have the average. The average is one more
aggregate function in SQL and you could use it in order to find the average of one column. It is almost the same as sum. So it works with the columns
that has some numbers. It will not work the average if you use it on the first
name or last name, there's characters, so
only on the numbers. But the only difference is that, however, is going to
deal with the nulls. So e.g. over here we have
the null in the score. It will not consider it
as a zero, as a sum, but it will ignore it completely
because it considered as zero is gonna be really problem using
the average function. So in average, the nulls
will be completely ignored. So let's have the following
example or the task, find the average score
of all customers. So let's try to solve that. We will be focusing on
the table customers. As usual. I'm just going to
select everything to check the result over here. So we need the column score and we need the average
of those values. So in order to do that, we're going to write
the keyword average, open brackets, and then the column name
and close brackets. So let's run this. So with that you got the
average score of all customers. The nulls are ignored. And I like to rename it as
Very score. Run it again. It looked better. Now we have the
average score, 625. Alright, so now
we're gonna move to my favorite aggregate function. We have Min and max. I use it a lot once
I'm doing like data profiling in order to
understand my data, e.g. if I am row filing or checking the table
orders for first-time, I will be interested. What is the latest date or what was the latest order dates? So in order to do that, we could use the max function on the order date and
we're gonna get the latest value or e.g. I'm going to check
okay, Which customer has the highest score. So I could go to the score
and do a max function. So the max and Min,
It's like the count. You could use it in
any type of columns, so you could use it on
numbers and characters, on dates is going to work
and hear about the nulls, it's going to be ignored. So if you are going
to say, okay, what is the minimum
value on the score, you will not get the null, you will get 350. Was Maria. Let's have some example
and tasks in order to understand how to
work with Min and max. Alright, so we have
the following task. It says, find the highest score, the maximum score in
our customer's table. We have the same
table over here, so I'm going to remove the
average, select the data. So I want to get
the highest score. So this should be done. In order to do that, we're
going to use the function max, open bracket, score those
brackets and run this. If you do that, you're gonna get the 900s. And that is true. Just going to rename the column. Let's run that again. We have the max score as 900. So let's now find
the lowest score. The lowest score over here
should be with Maria 350. In order to do that, we're
going to use the function mean on the score as well. We changed the name
just to look better. And run that again, though with the mean score, we're going to get the
350 and not the null. So this is very important. Alright, so now let's keep
playing with the data. Let's take the order. So I'm going to get
the earliest date on the order dates
and the latest. So let's try to do that. I'm just going to remove that. Select the table orders. Now we want to get the
earliest dates and the maximum date or the latest dates from
the column order dates. In order to do that,
you're going to use the function mean when brackets, order date, and then closets and just rename it for the
results, mean order dates. Let's run this. And with that, we got the minimum dates
in the order date. So this is was the first
order data in the table. And let's get now
the latest one. So in order to do that, I'm just going to
change the function max and just change the name of
it for the result. And see. This date is is the latest dates that we have as an order. Alright guys, so with that, we have learned all the
aggregate functions in SQL. They are really important for data analytics and data science. Next, we're going to cover
the string functions. Where are we going
to learn how to manipulate the text data?
30. SQL | String Functions: Alright, so as the next
Reagan to learn how to clean up our data using the
SQL string functions. In many cases, if you are
working with a big database, you will have a lot
of columns That's includes values like
text or characters, we call it string. And the data quality insights such a columns might
be sometimes bad. So you will be end up needing
some functions in order to manipulate the
structures of those values. So in SQL we have the following
SQL string functions. We have the concave in order to connect to strings in one value, the lower and upper in
order to transfer the data to lowercase or to
uppercase trim. If you have some whitespaces at the start or the
ends of the value, you could remove them
links in order to calculate the length of the
character or the value, then we have the
substring in order to return a sub part of the string. Alright, so now
we're going to have some tasks in order to understand how to work with
those string functions. The first one it says
list all customers name, where the customer
name is a combination between firstName and
lastName in one column. So let's try to do that. We need the list of all
customers names we have here, firstName, and we have as well the LastName
from customers. So if I execute this query, I will get the following. We have now a list of
all customers names, but now we didn't solve really the tasks because the task says, we want to have
customer name where the firstName and
lastName in one column. And as you can see here, we have it separated
in the database. So in order now to connect
those two strings in one, we're going to use
the function concat. So let's see how are
we going to do that? So we need the keyword
con, cats, open brackets. And here we're going to
list the first column, firstName, comma, last name. So I'm going to move those
here and let's see the result. So as you can see, okay, now we have the first
name and last name together in one column. So if we want to separate
them as well from each other, we could use one more string. I'm going to put the
minus between them. So I'm now connecting
three strings. Firstname minus,
this is from me, then the last name. So let's check how it
going to look like. So as you can see,
Maria Minos Kramer. So with that, we have a list of all customers names with the first name and
last name on it. But I just want to rename
it as well to the customer. Name. Me, make it smaller. Alright, so let's vary that. As you can see now we have
a column called customer names and we have exactly the
information that we need. So if you want to connect like two strings
or more strings, you could use the
function's concave. So another task
that's mites be okay, I want all the first names to be in uppercase or lowercase. So let's see how we can do that. So now we're going
to remove this. And we're going to now transfer the first
name to uppercase. So if I just query
now the first name, you can see it is not uppercase, it starts with big M, then the rest are small. So in order to convert
everything to uppercase, we're going to use
the keyword or the function of our brackets. Close it, and I'm going to
rename it to upper firstname. Let's run this. And as you can see now, all the names now
with the uppercase, you could do as well the
same with the lowercase. I'm going to use
now the functions lower firstName as
lower virus name. So let's run this and
as you can see now, I transfer the string from like the uppercase to lowercase. One more thing to notice here. So any changes that's now
I'm doing in the query, it will not update the
contents of the table. That means the firstName
going to stay like before, thus Maria with the first
character m and there is small. So now we are just
changing or transforming the data at the result sets
that I'm getting as output. So nothing's going to change on the table unless we
do some updates. We're going to learn that later. So now we're just transforming
the data for our results. Okay, so now let's talk about the trim. This
is interesting. Sometimes in the database you might find something like this. Like the name Maria, and before that, we
have an empty space. So someone before
entering the name Maria, they entered whitespace
before that it happens. Or at the end, someone
intertwines whitespace. Usually this is like bad
data and we have to remove it in order now to work
with that and our query, we could use a function trim. So for the left one, we call it the lifted space. For the right one, we
call it the right space. So in order to remove the
left spaces from the name, we could use the
function L trim, that means left trim. And if you execute that, this whitespace will be removed from the query,
from the results. And if you have on
the right side, you have as well whitespace, you could use another function
that is called our trim. That means right, trim. And if we execute that, it's going to remove
any whitespace is at the end of the string. If you have the situation
where you have both. So either you're going to
apply lift trim and dry trim or you can use
the function trim. Trim it going to remove
both of the sides, the left atrium and
the right trim, and you will not
have the results any whitespaces, the string. Okay, so now let's
have some examples to learn about the Trim. So if you check our
tutorial database, you might already find out there is some
whitespaces around. If you check the table customers
exactly in the lastName, you will find here some leading
or some left whitespaces. So let's query that's unchecked. Us. Select LastName
from customers. So now if you take the results, you might find, okay, there is here lift, lift whitespace, but
I have here for you tip in order to find all those whitespaces
that are hidden. So e.g. we have as well as
Cramer as well whitespace, but you cannot see it if
you check the results. So I would say just copy the value and put
it at the editor. So if I put it at the editor, you could see there is
like a right whitespace. And let's take all the values. Let's see, steel is clean so there's no whitespace around
and pips remove those. Beeps has like lift whitespace
and the right whitespace. So we have to repair that. Now. Molar, molar save, we don't have whitespaces
around Rankin as well. I think the same. Yeah. We don't have whitespaces, so let's try to repair that. We just going to use
the function trim, the keyword trim
brackets. As usual. I'm going to call it
clear clean last name. So let's run the query
and check the results. So let's check
Kramer whether there is any whitespaces around. So as you can see, it's clean. Let's have another example
of our pips as well, clean so we don't have any lift whitespaces
or right whitespaces. You could use the function
trim in order to remove them. Okay, so now let's move
to the next function. We have the link. If you want to calculate how many characters do
we have in one string, you could use the
links function for some reason if you want to calculate how many
characters we do, we have the last name, we could do it like this. I'm just going to
extend our query. You calculate that. So in order to do that, we're going to use
the keyword links. And inside it we're going
to put the last name. Who calculates how many
characters do we have there? I'm just going to rename
it to Olin last name. So let's run the query. And you can see the
database already calculated how many characters do we
have in the last names? You might already
noticed it is not really true because
we have here Kramer, it's only six characters, but the database
is showing seven. And that's because
we have whitespaces. So this is really nice
way in order to find out whether there is
whitespaces or not. In order now to
clean that you could like merge those two
functions in one. So I can put first the
trim inside the link. So first I'm cleaning
the data and after that, I want to calculate the length. So in order to do that, I'm going to make a new column. So first I'm going to
trim the lastName. And after that,
I'm going to apply again another function links. So I embedded two functions
and one as, let's say, let's call it clean
lynn, getting long name. But anyway, let's
see the results. As you can see now we have the clean links
or the last name. So we have exactly here 65. And as you can see here, there is like two whitespaces. And those names don't have any whitespaces because we have exactly the same
number of characters. Okay, so now let's move to the last string
function that we have. It is the fun one substring. So let's say we have in the
database the following name. We have Maria. Each character in the database
has the position e.g. M is one, a is two, r is three, and so on. And if I want in the query
to subtract this name, and I just want
to be part of it. I could use the
function substring. So the substrate has
the following syntax. I need to define inside it the
column name or the string, then the start position
and the length. Let's have the
following example. If I say I want to
substring Maria, starting from two, and
the links is three. So we have here two pointers. The first pointer
is where to start. So we're going to start
with the position two. So it's going to calculate 12. And this is our
starting position. And from this point we can
calculate three steps. So here we said three as
links or steps. So 123. With that, we have like starting point and ending point
for the substring. So if you execute
this query over here, you will get as a
resort or sorry. Okay, so now let's have
some life example. We can apply the same
rule on the last name. So I'm going to remove
the old part over here. So I'm going to use the same
function, so substring. And we need to define now the column name
is the last name. The starting position is to the links or how
many steps is three. So let's call it sub last name. And let's run this
and see the results. So if we take the
result now we can see that we don't have
the whole lastname, but only part of it because we define
the substring on it. So instead of Cramer, we have only RAM. So it started with the position two and we cut three characters. So RAM from steel, we started with t and we have E. Alright everyone, so that's
all for this chapter. We have learned many
important functions. And now in the next chapter
we will raise the level again by learning
advanced topics in SQL. And we will start with
the group by clause.
31. SQL | GROUP BY: Alright guys, so, so far
we have learned how to aggregate our data using
SQL aggregate functions. E.g. if you want to get the
total number of customers, you're going to go and
use the count star on the table customers and
you're gonna get five. So sometimes this is not enough. Sometimes you need
to group up there rose by a column value, e.g. we don't want to get
the total number of customers of the whole table. Instead of that, we want to get the total number of customers
By the country values, e.g. I want to see how many
customers we have from Germany, how many customers
we have from UK, USA, and so on. So here we are grouping up those customers by
the country values. And in SQL, in order to do that, we're going to use
the clouds group by, Alright, so now we have
new clouds in our query. And as you know, SQL is very sensitive about
the order of those clauses. So we have to follow
the rules here. We cannot go and say, Okay, let's start with where, then select from no, we have to follow the rules. So we start with select
from joins where and the group by its comes
always after the where. So we cannot place
it before the where. So if you have any filter, you should do the filters on
the tables and then comes the group BY as well grew
by it is an optionals, it is not a must clouds. It's not like select from. So if you need grew by,
you're going to include it. But after the where this
is very important, okay, so now in order to
understand the group by, we can have one task and try to solve it using SQL. Let's go. So the task says, find the total number of
customers for each country. So that means we need to Grubhub the customers by
the column country. So we're going to build
this step-by-step. So we're going to start
with a select star from customers does to check what we have in the
customers as usual. So now we need to count how
many customers do we have. And with that, we learned we're going to use the
function counts. And we're going to
close it like this. I'm just going to rename
it as a total customers. So let's run this. So now we have the total
number of customers by five. But now we want it to be divided on the countries to
Grubhub by the country. In order to do that, we're going to use
the clouds now, grew by, proved by keywords. And after that,
we're going to name the column that we
want to group by. So in our example it
is the column country, but this is not enough. We want to include, as well as the select statement. In order to do that, let me just select as
well that country. So with that we say it's okay. I want to count the total
number of customers together with the country and then
group it by the country. So let's run this. And as you can see now, we have not only the total
number of customers, we have as well, the
country and the customers are grouped up by the
values of the country. So in Germany we
have two customers. In USA we have as
well to customers, and in UK we have one customer. So with that, we did the total number of customers
by specific column. Alright guys, so now let's
take step-by-step what the database done once we
executed the group BY? So first, it's going
to ask is clear, which table do we need? We have the table
from customers, so it's going to focus
on the table customers. And then says, Okay, which columns do we need? We need the column counts. And then as well, the new column total customers. Alright, so now after
that's going to take okay, there is group BY and count. So with a group BY
what SQL gonna do, it's gonna go to
the column values in the country and only list the unique value that distinct values that it finds
inside the country. So it's gonna go
one-by-one, okay, Germany, it's gonna be listed
over here, USA, UK. But it will not list again, Germany because we have
it already in the list. And USA we have it as
well already in the list. So it's gonna go and aggregate all the roads for
the column Germany. So it's going to see, okay,
for the column Germany, we have it twice. So it's going to type over here. Let me just do it
like this. Two. Then it's going to go
to the next column. Okay, How many USA
customers do we have? Going to count 1.2. And we're gonna put as well over here too. Then for the last value
at gonna Grubhub or count how many customers we have for UK and we have exactly only one. So that's how the SQL works
and why we get these results. Okay, so now we could
extend our task and we say, I want the same results by
the total number of customers should be sorted with the
lowest fares than the highest. So in order to do that, we're going to use the order BY and here it's
very important that the order BY comes after
the group BY order BY. We are ordering by
the count star, so the total number
of customers. And here you could use
the ask or without it, because it is the defaults. Let's execute this. And you can see the result is sorted now by the
total customers, where the lowest fares
and then the highest. Okay, so now let's have
another example for the group by and the task says, find the highest score
for each country. So this time, so we don't
need the count function, we need the max function. As you notice already, with a group BY we need always
those aggregate functions, but it is not a must. So let's try that in Scratch. So select star from, well, let's make
it big customers. We want now the highest score. So we're going to use
the function max. Open brackets are
column, is score, and we're going to
rename it max score. So this is not enough because
if I execute this query, I'm going to get the highest
score of all countries. But this time we need to
group it by, by the country. In order to do that, I'm going to list in
the select the country. And let's make it more
beautiful and then use the clouds group by country. So that's I'm finding now the highest score
for each country. So let's run this. And with that you can see the highest score in
Germany is 500's. The highest score in
USA is 904, UK is 750. Okay, so let's check
what the data is done. We selected the table customers. We said we needed the
column country and a new column called max score. And in the SQL we have the
group BY of countries. So that means the database
is gonna go and select all those values and put
only the unique values. So that means Germany, USA, and the UK. Then it's going to
start like finding the max of each those countries. So it's going to select
first for the Germany, we have two rows, 4.1, and it's going to find the maximum value
of those two values. So 350.500, It's going to select this value in the result
because it is the highest, then it's going to
select for the USA, the two records over here. So we have usa over
here and one here. And the max value of
those two values, 900 and null, it's
gonna be the 900. So it's going to put
it at the results. For the UK. We have
only one record, so the max value is
gonna be the same. So it's going to be the 750. And that's how the
database build up this results from our query. Alright, so that's all
for the group by clause. And next we're going to
talk about a related to B. It is the having clause.
32. SQL | HAVING: Alright, so, so far we have
learned how to group up our data using the
group by SQL clouds. But sometimes you
might be in situation where you are working
with really big table, where you have in one column
many different values. In our example we have
only three values. It's just to make it simple, but in real-world scenarios, you will have really a lot
of values in one column. And you will be first to use
some filters on the results. So in older now, to filter the results that we
have from the group BY SQL, we have one more new clouds
and that's called having. Alright, since this
is the new clouds, we need to understand
where we're going to place the
having clause. Because as you know,
it's scale is sensitive about the order
of those clauses. So we will have the having clause exactly
after the group BY, so once you define the
group BY, after that, you're going to define the
having clause and it is as an optional once you want to filter the aggregations
functions, you could use the having clause. So with that we have all the clauses about the
select statement or the query. It started with select from
joins where group by having. And lastly we have the
order BY and limits. Okay, so now in order to
understand the having, we're going to have
one task and we're going to try to
solve it using SQL. The task says, find the total number of
customers for each country, but unclothed those
countries that has more than one customer. So that means we have here a condition to filter our data. So let's try to solve
that using SQL. So as usual, we're going to
start with querying our data. We're going to focus on the
table customers over here. So now we need now to have the total number of
customers by country. That means I need to do groupBy and use the
aggregate function count. Like before. I'm going to use
a keyword counts, star and rename it, will look good at the results. So counts, or we call
it total customers. Since we're going to
group BY, by country, we have to include the
country as a select. And after that we just going
to group by that country. Let's run this. We
see at the results, we have now all the countries and we have the total
number of customers. But our task is not solved yet because we still
have a country where it's total number of customers is not
greater than one. So we need to filter this data in order to do
that with the group BY, we're going to use the clouds
having and think about it. It is like exactly
like the where clause. We're going to write
down one condition. So our condition says the total number of customers
should be greater than one. So the total number, that means the count should
be greater than one. So we have defined
our condition. It's exactly like
the where clause. And let's run this. And as you can see, we don't have now the UK
with the one customer. We have now all the customers
aggregated by the country and the country that has more than one customer
in their results. With that, we
filtered our data and we have exactly what we want. Alright, so now you
might be wondering, and you want to ask
him, you borrow. Why do we have such a clause
called having any squeal, we can just go and use the where clause because there
we could filter our data. We could define exactly
the same condition and we filter our data. Why SQL has one more function or clouds that does
exactly as where. The answer for that is. Where you could use it only on the columns that exist
in the database. E.g. if I want to filter the country or if want to
filter the score or last name. So any columns that I
have in the database, I could filter it with aware. But once I want to
filter the data based on a column that doesn't exist
in the database, e.g. the count star or the max min. So any aggregated function that we are using in the query, and we want to build up like a filter on top
of such a function, then we cannot use the
where we should use having, having only works with the group BY once we are
doing aggregation. We could define here
filter on top of it. But the where clause
works only on the columns that we have already
exist in the database. So that means if I have this
results and I want to filter the data where I don't want
to see the country USA, other results, I should
use the where clause. So let's do that. The Wire comes after
the from where our column is country
not equal to USA. So let's run this. And with that you see here
we have filtered the data. We don't have to
use other results. So if I want to
filter the country, I need to use the where clause. If I want to filter the aggregate function
or the group by, I have to use the having. Alright guys, so with that, we have covered
the having clause. And next we're
going to talk about the concept of
subqueries in Israel. Where are we going to
cover exists and in, and learn the differences
between them.
33. SQL | SubQuery: EXISTS vs IN: Alright, so now we're
going to learn about how to do subqueries using SQL. This is extremely
powerful in SQL. Once you learn how to
do the subqueries, you will be able to do a lot of complex and important
tasks using SQL. So what is a subquery? It is like you have
different queries that are nested to each other that as you have one query embedded
in the other query. So in the normal situations
and the brewers materials, we had only one query, one statement that is querying our data,
e.g. the customers. But with a subquery
is you will have different queries that are
the bending to each other. E.g. we have here query
number one that asking the data from the
table customers and then present their results. Then we will have another
query, gray number two, that is depending on the results and making glitzy another
select statements. With that, we're going to call the query number
one as a subquery. This will be the basis for
the next query that we have. So with that, you
could do really nested queries, Not only two, maybe 34 and so on, so that you could do nested
queries and not only one. Alright, so now we're
going to learn how to do subqueries using SQL. And for that we
have two options. Either we're going to use
the operator in or exists. So now we're going to
focus on the operator N in order to solve
the following tasks, the following tax says, find all orders that are
placed from customers with a score higher than 500
using the Customer ID. So let's try to solve that. So that means we're going to
focus on both of the tables, orders and customers and sends. At the end result, we should present
all the orders. I'm going to start
with that query first. So we're going to say
select star from orders. So as you can see, we
have now all the orders, but the task says, it should contains only
the customers that has higher than 500s as a score. So that means I need to
find out which customer ID over here has a
score higher than 500s. In order to do that, we need to check another table. So select star from customers. And now we need to put
the filter that we need. So where score is
higher than 500. Let's run this. You could run this separately if you highlight it
and then execute. So with that, we know
that a key customer ID 2.3 are the customers with
the score higher than 500. So I could go back to my original query and
make this filter. So I'm going to say
where customer ID, I would say in 2.3. So with that, with this
filter, I'm saying, okay, those customers have higher
score than five hundreds. So let's run only the other
Bart's and check the results. Now, I have the orders
for those customers, and with that, I solve the
query and now comes the buds. This is really bad to do
because it has two problems. First of all, I went
to another table. I found out those IDs manually. So it was like we can
do it with small table. But imagine if you have like
big table with a lot of id. So you need to give them
extra in the next query. And sometimes it is almost impossible with this
small example it is okay, but with big tables, this is impossible to do. Second problem is that once you are having
changing data, e.g. we are getting more customers, we are getting more orders. That means each time, like I'm getting a new
data in my tables, I'm going to go and
check the query over here and adjust our query. This is not dynamic, so this is really bad. So instead of that, we're going to do small
trick that's going to solve everything and make our life
easier with the subqueries. So instead of having those static numbers in
the filter over here, I'm going to remove them. And instead of that, I'm going to say this query
going to be my subquery. And this over here
gonna be my main query. The results that I'm getting over here that check that again. So the results I'm
getting over here, it's gonna be like
feeding the other query. So for That's what I need
is really to have 2.3. I just need the customer ID, so I don't need
all those columns. Instead of the star, I'm going to say customer ID. Let's run this again. As you can see, we have now 2.3. So it doesn't matter how many new customers
I'm gonna get. I'm going to always have a full and right list
or the next query. So what I'm gonna do, I'm just going to cut it
and paste it over here. I'm just going to
put it in a new line so it looks much better. So with that, I embedded
one query in the next one. So this is the subquery. It has always those open
brackets and close brackets. With that, I'm
indicating for SQL, we have here a subquery, and here we have the main query. So let's run this and
check the results. As you can see, I got exactly those orders from the customers whose score
is higher than 500. And now we could have new
orders, new customers. I don't have to deal with that. All is my query will
solve my problem. And I don't have to add
all those IDs in the, in. Instead of that, we're
going to have it very dynamically and very powerful. So this is much
better solution than having a static IDs
inside the n statements. And we are very
dynamically, if you like, just go through that and do more nested queries and so on. You will be able
to solve a lot of complex and important
tasks using SQL. Alright, so now we're
going to try to solve the same tasks using exists. Exists is little bit different
than in like both of them. We're gonna get the same
result, but with exists, you're gonna get
better performance if you have like big tables. So if you're having big
tables and you are suffering with performance from
the in operator, you could start using the exist and to check whether you will
have better performance. So we tend to use
exist more than n If you are facing
performance problems. But it is little bit more complicated than
they exist because there is no clear separation
between the query one and create two or the
subquery and main query. So let's see how are we
gonna do that using exist. I'm going to open a new tab. So we will have the same setup. So select star from orders. But now we're going to have some aliases because it
is something like Joins. So I'm going to have the name
0 as alias for the orders. And now we're going
to type the filter where then after that we can type directly the exists
basements where exists. Then we will have the sub query. Now we're going to write a
subquery so we can select. And now here we could
write anything as columns, so they exist will not depend on the selected
columns over here. So you could write anything like customer ID or star
or anything you want. We tend to any scale
to write just one. So because we don't
care about that, just to make sure
that the result from the SQL subquery
is not important. It is like the join. So select one from customers, I will give it a name. Now we need to add the filter. And here it is exactly like
are they doing the joins? See, customer ID equals
two orders, customer ID. So as I said, it's like a join. And after that, we have another filter on
the customers and that we need the score
to be higher than 500. So with that, we have
over here our subquery. It looks little bit
complicated compared to the n. So here we have like some
kind of like inner join. I cannot trend this
part of a squared. I will get an error
because I have such a kind of like those conversion
between the ideas. So in order to get the result, I need to run the whole thing. So let's see and run this. You can see I got exactly the
same results exist and n, which will give you the same results I
tend to use in like, when it's like, let's say
small tables and so on. But once I have bad performance, I will switch to exists. And it's up to you which
one you're going to use. But both of them are doing the sub-queries and doing
this dynamic in SQL. Alright guys, that's
all for this chapter, we have learned some advanced
topics in SQL and mix. We will start learning
how to modify our data inside our SQL tables. And we will start with
the insert statements.
34. SQL | INSERT: Alright, so, so far we
have learned how to query, how to retrieve our data from the database without
changing anything, without changing the content of the tables or
changing the columns. So we have used
the command select in order to retrieve our data. And with that,
those commands will not change our data
inside of our database. So next we're going to
learn how to manipulate our data inside of our database in order
to change the contents. And for that, we have a
new set of commands inside a new SQL category that is called DML Data
Manipulation Language. And inside it we have
three main commands. We have the insert. We could use it if you want to insert a new data
inside our tables. We have deletes. If we have some
existing roles and we want to delete it
from the database, we could use the delete command. And the last one we have updates
if you want to update or to change the content of
existing grows in our tables, we could go and use
the update command. Alright, so now we're going to start with the first command. We have the insert command. We're going to learn
now how to insert new rows to our database. So we're going to focus
on the table customers. As you know, in our
tutorial database we have five customers. And now we're going
to practice by adding one more new customer to our database to learn how to work with the
insert commands. So before now we are inserting any new
stuff to our database. We really have to understand
the structure of the table, the structure of the columns. Because if we don't know the structure and the
definitions of those stuff, we will be having some errors while we are inserting the data. So just knowing that we have like five columns
inside the table, customers, that is not enough. So we really need to understand the definitions of the tables before we start inserting any new data to our
table customers. And to do that, I usually use the
following keywords. So describe customers,
the table name. So what I'm saying
now to the SQL, give me the definition of the table customers
so I can have a look. What do we have for each column? The first look, it might look
a little bit complicated. Don't worry about it.
I'm going to explain all those stuff step-by-step. So we are saying, Okay, database explained for me or describe for me the
table customers. As you know that each tables
contain multiple columns. So we can see in the results
we have here five columns. We have customer ID, FirstName, LastName,
country, and score. Those are the column names. And for each column we have
over here descriptions are properties that
describing each column. We have here the
data types, e.g. if you take here now
our table of customers, we have in the customer ID only numbers and
they are unique. So we have 12345 and
those are numbers. So the datatype for the customer id is like something
like numbers. And in database we call
them integers or int. And the firstname, it's like we don't have all the numbers,
we have character. So we have Maria, John, and they are like text, and we call them in
the database var char. There's different types for
such a characters, e.g. we have character
or char and so on. But in the best practices we use var char because they optimize the spaces or the sizes
in our database as well. We can see here there is like the size of the var
char we have here 50, that means the maximum I'm loud size for the
FirstName is only 50. So if you having more
than 50 characters in the firstName
database will cut it and insert only 50 characters
for the first name. So here we are like putting
some rules for each columns. So the first name should
maximum beefy characters, the same for the last
name and the country. So if you have really long name that is more than 50 characters, it will not fit in this column and the
database gonna cut it. So you could apply as well
as the datatype over here, some rules about the
size of each column. And we have as well the score as you can
see in this course, we don't have any characters. They are only like numbers. We call them integer. So with that you can see each column has a
different data type. You have more like understanding of that
description of the columns. After that, there
is a field called nulls and you can see
here only no and yes. It says, are the nulls are
allowed in each column or not. So e.g. on the customer ID, we are not allowing any null. So here, the database, if you insert an enol, that database would say
no, it's not allowed. So in the definitions, there is no null allowed. And the same goes for the
firstName and lastName. Once we insert data
to the customers, we always have to have customer
ID, FirstName, lastname. But now with the score and
the country, we say, yes. So the nulls are allowed, e.g. as you can see in the score
we have here one null. And in the country, if you don't specify anything
in the insert statements, there will be no problem. And the database can see
gonna show us a null. So here we can see
the definition where we can add nulls and
where it is not allowed. So we have over here as
well a key for each tables. In SQL databases, we
have primary keys. The keys that defines each
customer or each row, e.g. in our table over
here, customers, we have the customer
ID as a primary key. And once we say brown murky, he comes to stuff. First, it is not
allowed to be null, and second, it should be unique. That means it is
not allowed to have two customers with the same ID. So Maria and John should always have different
customer ID. We cannot have
both of them, e.g. the customer ID one
here should not exist any WE kits
and this is unique. So this is the most important
thing to understand about the primary key that
they are unique. So if I go now and insert one
more new customer and say, Okay, we have a new
customer called, and she has a customer ID five. But since in the
database we have already the customer ID five, the database is going
to give you an error. So here it's very important
to understand the structure. Which column over here
is our primary key? Then we have some
other information e.g. we have here extracts. It says it is an auto-increment. Auto increments means like
if I add a new customer, the database going to increment the customer ID
automatically, e.g. if I add in one
more new customers, I don't have to specify, like the customer ID
should be number six, that database is gonna
do it automatically. So here we have added some extra information
that it tells us this id will be generated from the database and we don't
have to specify it. So now we have more insights
about the table customers. We know the definition of each
column and we could start now inserting new record or new rows to the
table customers. So I'm going to open a new tab. And we're going to
start using the insert. So I'm going to type here,
insert into keyword. And then we have to specify
the table name where we can insert our data in
the table customers. Then we have now to specify
the values for each column, values of n brackets. And now we're going
to start one-by-one. So the customer ID, I want to check that again, the customer ID is integer, it is the primary key
and auto increments, that means that Delta V is
going to increment the new ID. I don't have to do it myself. So I could go and say defaults. Defaults means that data is
going to take care of that. I'm going to insert
the customer id seeks. You could go and say, instead of that, I'm
going to type number six, but I really don't recommend it because if you have
like big database and someone else is doing
inserts or you forget about what is the last customer
ID we have in database. So just make your life
easier and type defaults. So now we have to
enter the FirstName. I'm going to use e.g.
that firstName Anna. Here we have problem
in SQL database that you cannot just type the
first name like this. It is a string and int string. We have to boot it always inside single quotes
or double quotes. So e.g. I'm going to use the double-quotes in order to like to deal
with the strings. If you don't do that,
you will get an error. I usually use one. So insert the strings so
that it should be okay. The last name is
the same thing as this var character and we
have to put a name on it. So I'm going to use
Nixon as lastname. So we have now the
three columns, Customer ID, first
name, last name. Now we have country and score. So let's check the country. The country it says
it is var character, so we have to specify
something over here. And we could leave it empty. So I don't have here really to answer anything if I don't want. And the same goes for the
score it is but I integer, but we could leave
it as well empty. So what I'm gonna do, I'm just going to
add the country. It is var character,
so it's a string. I need to put it
in single quotes. I'm going to use the country UK. Okay, so now to the last
column we have the score. So let's check that
in the description. So we have score, it is integer. So that means only
numbers should be inside this core is nullable, so I could leave it empty and it is not primary
key and so on. So that's means I could
leave it as a null. And that's makes
sense because Anna as a new customer and she doesn't have yet any like scores in
our database or systems. So that's why I could just
write over here and null. Or I could leave
it like this zero. If I want, so with that, I will just leave it as a null. Let's just execute the query and see whether we have
everything right. So he will not get
any results sets. We will just get you
the information that everything is green and we
have inserted the data. So in order to check now this
user inside our database, we're going to open new tab, select star from customers, and see whether Anna
is in the database. And yes, we have one
more customer calls, Anna Nixon from country UK. The score is now she is new and we have the
new generated ID, Customer ID from the database. Okay, so now let's
keep practicing and add one more customer, our customer number
seven in our database. So let's go and do that. I'm going to move everything
and start from scratch, inserts into our
table customers. And now we're going
to add the values. So as usual, our first value, the customer ID, is
gonna be defaults. The FirstName I'm
going to use max, and the last name, I'm going to use lighting. But now the country and score, I could leave them empty. So I'm going to use the null
as well for the score now. So now as you might
already notice what I've really done over here, I just gave a firstName
and lastName. And for all others, I'm using some
nulls and defaults. So we could skip that and
make our life easier, which just adding the
first name and last name. So if I just remove
the null over here and that default
and run the query, I will get an error because the database is not
understanding what is max. Is max like the country is max, the firstName, the lastName, the lung as well. Is it like the last, lastname? So we need to specify
for the database, what are those values
to which column. So in order to do that, I'm going to open here
new brackets and say, Okay, I'm going to
type the column name, firstname, and the second one
we are using the lastname. So with that, we are
telling the database, okay, the first values belongs
to the column firstName, and the second value belongs
to the column LastName. And if I run this, we will not get an error
because we have already done the mapping and everything
else is done automatically. So that means that database
knows the customer ID. It is like automatically
generated. So it's going to generate a new ID and sends the database, didn't find any information about the country and the score, it's going to put it
as default as a null. So let's check now the result. If I query now the same, select star from customers, and we can see
that that is done. That's an inserted our
new customer Max lying. She understood that the country or it understood
that the country is a null and score is a null and
generated the ID of seven. So as you can see, It's more compact and I don't have to add all those
nulls because imagine if you have a big
table with like 50 columns and you
have a lot of nulls, the query gonna look really bad. So here I'm just
inserting what I need and the rest is gonna
do the database from me if it is
allowed. So e.g. if the country
should not be null, I have to insert you
something about the country. But since we are allowing the nulls in the
country and the score, we could just ignore it
and leave it like this. Alright, so with that, we have learned how to insert
data in our SQL tables. Next, we're going to talk
about the update statements.
35. SQL | UPDATE: Alright, so now
we're going to talk about one more command in order to manipulate our
data inside the database. And that is the update commands. So you can use updates
in order to change the values of an already
existing row in your tables. Okay, So let's have now
the following task. We just added a new customer
with the insert statements, and that is max, the customer number seven. And as you already noticed, this is the only customer
that we don't have a country specified
in the database. The task is now is just to add the country Germany
to this record. So now we have to
update the content of this customer by changing
the null to Germany. So now we're going to start
with the keyword updates. And now we have to specify the table name that
should be changed. So we're going to have the
table named customers. And after that to the new line, we're going to have
the keyword sets. With that, we can specify new values for the columns
that should be changed. So we want to change the column country and we have a new value
instead of the null, we need to give the value of Germany as a new value
for that country. Now he needs to be really
careful about that. If I execute this,
don't do this. If execute this commands,
what can happen? The database gonna go and
updates all the values for all customers
underneath the country to the new value Germany. Because if you read this, we are telling the
database that update the table customers
and sets country to Germany without
specifying any customer. That means if we run that all the countries will be
in the table as a Germany, so don't do that. Our task is here is only to change it for the new customer. So as you can here see, our customer Max has an
empty value add the country, and we only need to change it. So in order to do that,
we're going to filter, are we going to put like
condition for the updates? And in order to do
that, we're going to use the primary key, customer ID number seven. I don't recommend to use
any other columns like e.g. the first name or the last name. Because if you have a big table, the first name max, may be presented in
other customers. So maybe you have
different customers, the same firstName. And if you run the
query on the firstName, all customers with
the first name max will have the
country as Germany. So to make sure to
update the right record, the right row, we're going
to use the Brian hierarchy, the customer ID in
order to do that. So let's go back over here. And we're going to
write the where command exactly like the select. And we're going to say, we need to change
the customer ID. Number seven. With that, we are telling
exactly the database. We have now new value
on the country, and that is only for the
customer ID number seven. So let's run this and go over here and run this
again to check the value. So here we have
it empty or null. And after the updates, now we have Germany
inside the country. Alright, let's have another
tasks where we're going to manipulate and update the
content of our tables. The task says our new
customer and she was active. She bought something
in our websites and she has now the score of 100. So instead of having
the score of null, because you as a new customer, we have now 100's for Anna. Not only that, we have
entered by mistake, the country UK instead of USA, show Ana comes from USA and we have to update as
well, the country. So let's do that using
the update command. Alright, so we're
gonna check over here. So before we start like updating the values
in the columns, Let's go and make sure that we have the right
customers so we are not updating different customer or updating the whole table. So let's make sure that we are selecting everything right
in the where command. So Anna has a customer ID
number six instead of seven. We're going to write
here number six. So now we are focusing
on the right row. And now the country
should be USA. So now we are giving a new value for Anna in the country field. And we want now to specify one
more column to be changed. In order to do that,
we have that comma. I like to put it in
a new line and the score should be equal to 100. So with that, you
are specifying life multiple columns in one updates and you can split
them by a comma. So if I want to change
one more column, I could do it all
in one command. I don't have to have like different command
for each column. I could put everything in one. Now, what we are saying, update the table of customers, where the customer
ID is number six. And the country should
be equal to you as a, and the score should be 100s. So let's run this
and then go back to our select star
from customers to check whether
everything was okay. So I'm going to refresh that. And you can see now how the country USA and
the score is now 100. So it's really
easy to manipulate the data using the
update command. Alright everyone, so that's all for the update statements. And next we are going to learn the delete and
truncate statements.
36. SQL | DELETE & TRUNCATE: Okay, so now we're
going to move to the last command that we have under the data
manipulation section, and that is the delete command. So in order to delete
rows from our tables, we could use that deletes and let's have the
following tasks. The test says, Wait a minute, all the new users since
yesterday or since today, they were wrong inserted in our systems and rehab
to delete them. So we have the customer
and the customer marks. They should be deleted from our database, from our tables. So in order to do that
is pretty simple. We're going to use
the command Delete. Alright, so in order
to solve that, we're going to write
it very easy commands and as well it is
very dangerous. So we're going to
start by writing the keyword delete from, and then comes the table name. So we need to delete
from customers. As you can see, it's
only three words. It's very easy, but
if I execute this, be careful that it's
going to delete everything inside
the table customers. So I'm not specifying anything. I'm saying delete
from customers. And if I run it
the database gonna delete all our customers
from the database. So be careful with that. Always specify what do you
want to delete Exactly. So with that, it's
like the updates. We're going to use
the weird commands and use the primary
key, the customer ID. So we want to delete
the Customer ID number. Let me check again number 6.7. So in order to do that, I'm going to use the
in operator in 67. So any customer IDs in
6.7 gonna be deleted. So this is my filter condition. And if I run this, both of the user's
gonna be deleted. So let's check that. If I run this over here, you can see what other
customers are deleted. And with that, we have deleted some records from our customers. But be really careful what you are specifying
in the delete. So you don't delete
or you already cards. You might be during the
development of your tables, you are inserting
that like testdata and you want to
delete all of them. So if you want to make
a table and empty, you could go and say delete
from table name and you're going to make the
table empty and then insert again, it is data. But if you are like
deleting only few records, be careful what you are
writing and the where condition so you don't
lose all your data. One more thing here
to talk about, about deleting rows
that you might be in situation sometimes
you have very big table. And the mission is over here
is to delete everything, to delete all the rows
from this big table. So if you are using the
delete from commands, it might take a long time
because what SQL is doing, it's gonna go like for each
bunch of data deleted, then go to the next one. So it's going to do it like an iterative manner and it
may take a really long time. So instead of using delete, if you are sure that's okay, I want to make and table empty. I want to delete
everything from the table. I just want to have the
columns and nothing inside it. So instead of using
the leader is best practices to
use another SQL commands to delete the rows
and that is truncate keyword. And customers. As you can see,
it's only two words to destroy everything. So it's very short command, trying to get customers
you are telling the SQL, delete everything. I don't want to see annual
records inside my table. So the database gonna
do it really fast. So if I'm gonna run
this query over here, so I'm just going to
remove that delete from. We are deleting everything
in the table of customers. So if I do select
star from customers, the table is going to be empty. So if you have done that and you want to have
the test data again, just go to the tutorial database and rerun the whole script. Then you will have exactly
the same situation before you are deleting
the data from customers. Alright everyone, so that's
all for this chapter. We have learned how to modify
our data inside SQL tables. And now we're going to jump to the last chapter
where we're going to learn how to define
our data using SQL. And first, we will learn
how to create a SQL table.
37. SQL | CREATE Table: Alright guys and girls, so, so far we have learned how
to query our data using the select commands and as well how to manipulate our data, the values inside our tables
using insert, delete, update as an x, we're going to focus
on a new group, that is the data
definition language, DDL. It is about how to change the
structure of our database, how to change the
tables themselves. So we have here three commands. Create to create something new, like create a new table or create a new objects
we have dropped. To drop a table
or deleted table. Alter is to change the
structure of one table. Okay, so now we're
going to start talking about the first command. We have the create command. If you want to create
something new in the database, new objects, e.g. new table or new view
stored procedures in the databases there is like different types of
objects, not only tables. So you could go and use
the command create. In our tutorials, we will be focusing on creating
a new table. So in order to
create new tables, you have to define the structure of each
column inside it. And in order to do that, we have to specify those three informations
for each column. So each column
should have a name. This could be anything depends on your requirements
that you have. So it must have a
name, and after that, it must have datatype, exactly only one data type. So you cannot specify multi
data types for each column. Exactly one in my
SQL that is like big list of all available
data types in MySQL. I'm going to leave the link in the description so
you could check that the most famous
ones are int, var, char, date, jar, and so on. Those data types should be assigned for each
column and as well, you could assign inside them
the size of each column, the maximum allows size like it's a rule that you can apply. If you leave it empty
like this, only int, that data type is going to get a default one from the SQL. So if you define like
in our last example, the var char for the last name, varchar 50, that means the maximum allowed size for
the LastName gonna be 50. Anything that can exceed
the 50 characters, it's gonna be cut down. Only allowed 50 characters
inside the last name. So here you could specify
the data type and as well the size
of the datatype. After that, you have a bunch of constraints
that you can do fine on your database in order
to have some data quality. E.g. you have the
constraints primary key. You say this column
is primary key, and immediately it's going to be unique and not allow
any nulls inside it. And you could define for each column multiple
constraints, that only one constraint. So you could say this is a primary key and not null
and unique and so on. So you could define
multiple ones. So we have as constraints
in the database, primary key, not null. So you are not allowing
the null values unique. That means the value inside
it should not be duplicated. And then we have default. Defaults means if we
are inserting any data and we didn't specify
value for this column. The database is going to
use the default value that we have defined
in that column. So those constraints, as I said, you could use like all of them if you want
for each column. So it's really depend
on the requirements and on the data quality
requirements as well. The data types
should be only one, and for each column we
have only one name for it. Alright, so now
let's learn how to create a new tables using SQL. And we have the following task. Create a new table
called Pearson's. And inside this we're going
to have four columns, ID, name, birth
date, and a phone. As you know, in our
tutorial database, we have only three tables. So if you check
here, the left side, we have the customers,
employees, and orders. And now we can now add one more table called
Pearson's. So let's do that. Alright, so now let's
start creating our table. We're going to start with
the commands create table. And after that, we need to
specify now the table name. But before that we have to enter the database name or
another databases. It is the schema name. So as you might already
notice in my SQL, we have different databases. We have our tutorial database
and some default ones. We're going to put this table in our tutorial database and that is dB underscore
SQL tutorial. Then dots. And here we're going to
put now that table name and we have the
person's. After that. We're going to open
two brackets and inside them we're
going to define now the columns structure. Let's start with the first
column. We have the ID. This is our primary key, the most important like column, the whole table
at something like the customer ID in
the table, customers. So the name of it's
going to be ID. After that, I'm
going to have space. And then we have to
define now the datatype, since it's gonna be sequence
of numbers 1234 and so on. We're going to use the
datatype integer int. I will not define the sides. I'm going to use the one that we have as a default from MySQL. So now we're going to define the constraints that we
want for this column. Here, since it is
our primary key, we're going to use the
constraint primary. We don't have here
to specify not null because as the default, if you are saying
this is primary key, you will get inside
it two things. First, it's going to be
unique as well and not null. So it is two constraints
in one, the primary key. So after that, I don't want to generate those ideas
by myself manually, by doing the inserts. I want that the database
take care of that. So to do that, we can define it as
auto increments. So with that, if you are
using default or you are not specifying anything in
the insert statements, the id gonna be generated automatically from the database. So with that, I have
the column name, I have the datatype, and I have two constraints. So now we're going to
jump to the next column. We have the person name. So I'm going to add comma
and a new line for that. So here we're going to
have the person name as a column name space. After that, we need to
define the datatype. So since it's going to include
some characters and so on, I'm going to use the var char
and defined as a size 50. More than 50, that data is going to be cutted and inserted
in the database. So this is my role as well, a want that each
person has a name. So we don't want to
have some nulls. So now we can define
that constraints. So this should not
be null. That's it. I don't want to have some
unique constraint and so on. So we allow that we have two
persons with the same names, but they will have
different ideas. So that's enough
for this column. We're going to jump
to the next one. We're going to add the birthday. So the name of that
gonna be Birthday space. The data type of
that can be date. Now, I don't really
want to specify any constraints because this
column could be optional, so we will not add anything. So that should be enough. We have the column name and the data type of dots, a comma. And the last one, we're going to have the
phone as a column name. The phone could be like
characters as well. So var, char, our char. And I am going to allow only 15 characters to
be inside the phones. Or some data quality is, so the phones
should not be null. So here I'm going to add
a constraint not null. One more thing that I could add as a constraint on this table is that each person should
has a unique phone number. We should not have two persons with the same phone number. In order to first such
equality at your table, we could add the
unique constraints. And with that, we are
tiling in this column. We should have
only unique phones and duplicates are not allowed. So now we have all
our four columns. We have specified
the data types and the constraints, and that's it. We could run the
query over here. So we don't have any year ours if we check
on the left side, so we don't have yet the person. That's why, because we have to refresh the data over here. So click on Refresh
and you will see we have one more table
called person's. Okay, so now let's
check some stuff, e.g. if I go and say select
star from persons, just to check the
table structure. So here I can see, okay, I have a table called Pearson's. I have my four columns
and everything is empty. You could go and as well Jake, that describe commands for
persons and query that. And you can see we
have the fields, the data types, what is null, what is not null? The primary key, and what is
unique, the auto increments. So you could check that
everything is fine. And as we wanted. Alright everyone,
so that's all about how to create a SQL table. And next we are going to talk quickly about the altar tables.
38. SQL | ALTER Table: Okay, so now let's move
to the next command. We have altered table
and that's you could use it in order to change the
definition of a table. So let's say, Okay, we need to add one more column
to our new table persons, and that is the emails. So in order to do that,
it's pretty simple. So we could use, you can remove this. We could use the
keyword alter table and the table name persons. And after that, we're going
to add the keyword ads. Now we are adding a new column, It's like in the create table. So we need the column
name and that is email. Then after that, we need
to define the datatype. It's going to be var
char 15 as well as rule. And here as well, we need to add some constraints if you
want for some data quality, You say, okay, this is not null. So with that, I'm changing now the already existing
table that's called Pearson's and I'm adding
now a new column. So let's run this. And let's check again
our table refresh. Let's select the table
persons and see the results. And as you can see at the ends, we have a new column and always squeal going to add the
new columns at the ends. So if I check this as
well described person's, just to make sure that
everything is fine. We can see here we have
one more column that's called emails var character 15. And this should not be enough. Alright, so that's all
about how to alter a table. And now we are going to
learn how to drop a table. It's string, easy.
39. SQL | DROP Table: Alright, so now let's jump
to the last command that we have in order to change the
structure of our database. And that is a drop command if
you want to delete a table, so you say, okay, this
table is completely wrong. I don't want it at my database. You could go and drop the
table and that's pretty easy. You could do it like this. So let's say we want to drop the new table that you have
that's called persons. So we use the keyword drop table and just write down here the
table name, and that's it. Once you execute that the table persons will not
exist at your database. So I'm going to delete it. And as you can see
on the left side, you will not have
a table persons. So it's really simple. Alright guys, that's all
for the last chapter. And not only that, that's all for this course.
40. Tableau | Course Introduction: And welcome to this very unique
course to master Tableau. My name is Var Zlqini
and I'm currently leading big data projects
at Marsidespenz. With over a decade
of experience in big data data visualizations and business
intelligence projects. And I'm very excited to be your instructor for this course. In this 20, 1 hour course, I'm going to be sharing
everything that I know about, one of the most demand skill in data science and data
visualizations Tableau. So that by the end of
the course you're going to be able to create amazing D visualizations in Tableau like
I do in the real projects. I designed this
course to take you 0-0 If you are a beginner,
don't worry about it. I'm going to explain
everything from the scratch step by step. That means this course
assumes that you don't have any skills in data
visualizations as well. All the skills that you can
learn in this Tableau course, like data moduling and so on, could be used in any other
tools like Power BI and click. Now of course, you
might ask yourself, what makes this Tableau
course different and unique from all other
online courses? This is the only course
that breaks down the complex concepts of
Tableau into animated visuals, because visuals
are very powerful to make complex concepts
easy to understand to follow. In this
Tableau course, we're going to present over
250 animated skitch notes of Tableau concepts. Understanding the
concepts and how Tableau work can make you a professional and expert in data visualizations
and in Tableau. And in this course,
I'm going to provide you with tons of free materials. Like, for example, I've prepared three different data sources
for this course that we can use in all our tasks and examples through
the course as well. I'm going to provide you with
three tableau sheet sheets. One sheet sheet for
all tableau concepts, another one for all
tableau calculations. And we have one more
sheet sheet for all the visuals to help you
choosing the right charts. Having those three
sheet sheets, you don't to memorize everything. You have a quick reference and access to Tableau
concepts as well. You have access to
all Tableau files and dashboard that is created
during the course as well. All the skitch notes of each section are available
to you to download, so you can use it
later as a reference. Now let's have a sneak pick
about the Tableau course. We will start with the basics. What is business
intelligence data visualizations, what is Tableau? And then you're going to learn the Tableau product suites. And after that, we're
going to do deep dive into different Tableau concepts like the table architecture
dimensions, measures discretes
and continuous data. After that, we're
going to deep dive in Tableau calculations
and functions. You're going to learn more
than 60 different functions in Tableau to manipulate data. And after that, we're
going to go and cover more than 63 different
types of charts in Tableau. And then at the end,
we're going to go and implement Tableau projects, similar to the one that I
do in real life projects. So now the question is,
who is this course for? If you are someone
that has never built any data visualizations using
tools like Tableau or PI, I will be with you
in this course in each step starting from the fundamentals
and we're going to end up having the
advanced topics. And this course is
as well for you if you are already a
Tableau developer. So I will suggest for you
that to take a look to the course curriculum and start at the level
that suits you. I have covered a lot of
advanced topics and you're going to have a lot of best
practices in this course. And this course is
suitable for you if you have experience in any
other tools like in PI, and you would like to pick
up a new skill in Tableau. So let's jump in
and get started.
41. Tableau | Course Curriculum Overview: We're going to have a quick overview of
the Tableau course. I have splitted this course
into 15 different sections. For example, we're going to learn what is business
intelligence? What is data visualizations? What is Tableau and the
history of Tableau, And why Tableau is a
very powerful tool for data visualizations. After that, we're going
to go and deep dive into the Tableau product suites. We don't have Tableau
only one products. We have eight
different products. So I'm going to go and introduce
you to those products. And we're going to go
and compare them side by side for you to understand
the differences between them. And I'm going to
help you to choose the right products
for your project. Moving on, we're
going to go and deep dive into the Tableau
architecture. Here we're going to
learn many different concepts like what is life connections? What are the different types
of Tableau files? And then we're going
to deep dive into the Tableau architecture in
order for you to understand the main components of the architecture and how
Tableau internally works. After all those theory, we're going to start
preparing your environment in order for you to practice
with me in this course. So we'll go and
download and install Tableau for free of
courts at your PC. We're going to go and create
a free public accounts. We're going to
download the training datasets and we're going to publish our first
visualization and the ends. I'm going to take you
on a tour in order to make you familiar with
the Tableau interface. And after we have repaired
your environment, we're going to start
with the first topic, how to create a data
source in Tableau. And here you can gain skills
about the data moduling. So we're going to go
through the basics of data moduling and as well how
to do moduling in Tableau. And then we're going
to go and learn four different methods on
how to combine tables in Tableau using joints union relationships
and data blending. And of course, we're
going to go and compare them side by side
for you in order to understand the
differences between them and when to
use which methods. And at the end of this
section, we're going to go and create two data sources. Moving on, we're going
to start talking about the Tableau meta data. Here you're going to learn very important concepts in Tableau. The data types,
dimensions and measures, discrete and continuous values. Once you understand
those concepts, you can understand how to create visualizations
in Tableau. After this section, we
have a small section about renaming. Here we're going to talk
about the naming conventions that each developer should know. Then we can learn the
different techniques on how to rename columns and
tables in Tableau. And at the end, we can learn how to give aliases
to the values. Moving on to the next section, you can learn how to organize
your data in Tableau. And here we have
different methods like grouping up the dimensions
using hierarchies, grouping up the values
using groups and clusters. And then after that,
we're going to learn sets in Tableau. And at the end, we can
learn how to create pens in Tableau in order
to create histograms. Next section, we're
going to learn how to filter our data in Tableau. And here you can learn
the different types and concepts of filters in Tableau. How to create them and
how to customize them. And I'm going to give
you ten tips and tricks about filters in Tableau. And we will learn as
well in this section, how to sort our data. After that, we can learn very important
concepts in Tableau, which is the Tableau parameters. Tab parameters are
great in order to add dynamic to your
visualizations. You can learn the concepts of parameters and then you can learn different
use cases for that. How to make dynamic
calculations, dynamic reference line filters, how to swap measures
and dimensions, and as well dynamic pens. Moving on to the next section, we're going to learn as well
something about dynamic. So we're going to learn the
Tableau actions in order to make your dashboards
interactive as usual. First you can understand the
concepts of Tableau actions. And then we're
going to go through all Tableau action types. For example, how to go to URL, how to go to sheets, how to filter data
using actions. And then how to make
highlights using actions. And how to change the values
of sets and parameters. After this section,
we're going to have the Tableau calculations. This section is very huge. You're going to learn
how to transform and manipulate your data
using four different Tableau calculation types. So we have the role
level calculations, aggregate calculation,
table calculation, and the LOD expressions. In this section, you can learn more than 60 different
Tableau functions in order to
manipulate your data. Moving on to the next section,
we have another big one. We have the Tableau Charts. Here we're going to
go and build together more than 63 different
charts in Tableau. So we will start with
the basic charts, like the bar charts and
we're going to end up building very advanced
charts in Tableau. And at the end, I'm
going to help you to choose the right charts
for your requirements. Moving on to the next
one, we're going to learn the Tableau dashboards. We're going to go step
by step on how to create clean dashboards in
Tableau using containers. And now in the last section, we have a Tableau projects here. In this section we're going
to go together and implement the projects exactly like I do it in my real
life projects. So first we're going to
learn the different phases of each Tableau projects. Then we're going to start
with the requirements. So you're going to learn how I analyze the requirements
of Tableau. And then we start with the implementations
of the projects. So we're going to go and
build the data sources, the charts and two
different dashboards. So with that, you're going
to get familiar on how to implement projects and
companies using Tableau. So once you go through
all those sections, you're going to have a solid
knowledge about Tableau.
42. Tableau | Section: Tableau Basics: Tableau basics. Before you start learning how
to use any tools, it's very important
to understand the principles and the
theory behind them, which can help your career to be a professional developer
and as well an expert. That's why we're going to cover
now the following topics. The bazzwords of the big data. What is business
intelligence and what is data visualizations and
why it's very powerful? And at the end,
we're going to talk about what is Tableau and why Tableau is a leader
in data visualizations. So let's start with
the first topic. We're going to go and
learn the main bazzwords of the big data.
So now let's go.
43. Tableau | Big Data Buzzwords: If you are new to
the world of data, you must start hearing
a lot of puzzwords from big data to IOT data science, data engineering,
and phrases like, data is the new oil. In this tutorial,
I will be covering some important passwords about the data and what
they really mean. Let's dive in, we are living now in
the data driven age and data is generated
everywhere. We people, we generate massive amounts of
data as we speak. Each click on the Internet, each search e mail, or even if you are
ordering something online, we generate data. We spend hours every day on the social media,
Liking, commenting, searching our smartphone is just all time uploading
data about where you are, how fast you are moving. And everything we do online is now stored and tracked as data. Not only our smartphones
and computers are connected to the
Internet and generates data, but also we have something
called smart home. We can connect any device at
our home to the Internet. Just put the word
smart before it. We have smart mower,
smart lightning, smart fitness, voice
devices, security systems. All those devices
could be connected to the Internet and start generating massive
amounts of data. And this is what we call
Internet of Things, IOT. Iot is the concept of
connecting any device, anything to the Internet in order to generate
and exchange data. Not only we have
IOT at our home, but also everywhere
we are living in the digital transformation in the industry and manufactury. You might heard of the
concept Industry 4.0 the first Industrial Revolution
introduced in Germany. It's all about smart factories, connecting machines
and devices to the Internet in order
to exchange data. And now we can find
IOT's in the cities. We are trying to implement
those smart cities where we're going to connect everything
in order to reduce waste, saving money, improving quality we have as well
IOT's in our cars. Our cars are loaded with
sensors and devices that are connected to exchange data for many reasons like
driver assistance, object recognitions,
self driving systems. The list is just so long. In 2022, we have around 14
billions of physical devices, things from small household
cooking devices to the sophisticated
industrial machines that are connected
to the Internet, generating and exchanging data. The amount of generated data every day from
IT's social media, websites, machines is
truly mind blowing. There are currently
over 44 zetabytes of data in the entire digital
universe, that is 2010. That means we are no longer dealing with normal
traditional data, we are dealing now
with the big data. What big data means? There's three indicators that help us to understand whether our data is big and they are
defined by the three Vs. The first V is volume. Well, big data is big. With the growth of the Internet, mobile devices, social media, IT's the amount
of generated data from those sources has
grown dramatically. The second V is velocity. In normal data processing, we used to process slow data, or we call it patch data, once a day or something, and then we store
it in the disc. But in big data words, the sources are generating streams of data with
very high speeds. That means we have
to process and analyze the data in
real time fashion, and then we store it in
memory instead of disc. And the third V is variety. In traditional systems,
most data types could be captured on raw, unstructured tables like
database or Excels. But in the big data words, data often comes in
semi structured format. For example, several
logs in XML or websites. Or the data comes in
unstructured format. Like videos, audios, images, free text In big data, we have not only to deal
with structured data, but also with semi structured
and unstructured data. Though the big data
terms means how we can efficiently
store, process, and analyze our data when it
has huge volume, high speed, and different types in order to reveal significant
values for the business. But we still have a
problem with that. All those generated
data are raw data. Raw data are just
unprocessed rows and rows of numbers that are
really hard to understand, hard to read, badly structured, and almost has no
value to the business. Almost 70% of the
words data are unused. Raw data, if left without
processing and refining, is just worthless,
waste of money, waste of space, and it generates digital waste stores in very
expensive data centers. And that's why we have
the very famous phrase of the famous British
mathematician, Clive Humby. Data is the new oil. Well, it means that
we have to extract the raw data like we
are extracting oil. We have to refine
it, process it, transform it into something useful and has
valued the business. What this really means is that most of the companies
are sitting on very big field of
new oil, raw data. And most of them
understood that data is their most valuable asset.
They have to extract it. They have to analyze it in
order to reveal insight that could help them in order to make faster and
better decisions. And that's why most
of the companies are hiring army of data workers. As we know that demand
for data scientists is increasing rapidly
and the supply is law. Now what we can do
with all those chaos, all those generated
unprocessed raw data? Well, we can do the
following stuff. So what we can do, we can design or build a
data architecture. Data architecture
is the process of creating a blueprint
on how we organize, process, and store our data into different layers
for different purposes. Architecture makes
it easier to manage, protect, and access our data. Another thing that
we can do with raw data is data engineering. Data engineering is
very complex process of designing and building data
pipelines and data storages. In data engineering,
we usually build ETL processes to extract the raw data from
multiple sources, then transform it
and then load it to the target storage
in order to make it highly available and usable for the data scientist or
any other end user. Another thing that we
can do is data modeling. Data modeling is the process
of connecting the dots. So what we're going to
do is we're going to put all the data into
entities and objects. Then we describe
the relationship between those entities
in order to help us and help the programs to understand how the data
are related to each other. Another thing that
we can do with the raw data is we
can do data mining. Data mining is the
process of analyzing massive amount of raw data in order to discover knowledge, to discover business
intelligence like patterns and trends, to solve problems and
to mitigate risks. Another use of the raw data is that we can use it
in machine learning. In machine learning, we are providing the computers
with two things. First, the raw and
historical data, together with the mathematical
models and algorithms. Once the computer has
those two things, it's going to start training
and practicing in order to perform tasks like
predictions. It's like human. The more the machine
practice and train, the better and accurate
the results going to be. Next, we can do data science. Data science is the
scientific study of data. And it combines
three major powers. The power of
programming languages, together with the
mathematics and statistics. And the knowledge of
specific domain in order to uncover valuable knowledge and insights from our raw data. One more thing that we
can use on the raw data, and my favorite one is that we can use data visualizations. Data visualizations
is the process of converting numbers
and raw data, which is normally hard to
understand and to read into visuals and charts like
powers by three plots, in order to make it easier to understand and
easier to read, which really helps in
the decision making. There are many other things and processes that we can
apply on the road data, but these are the major fields of work that we can
use in order to convert the useless
road data into knowledge that has
significant impact and value to the business. All right guys, so that was an introduction to
big data terms. And next we will quickly learn what is business
intelligence? I using very simple example.
44. Tableau | What is Business Intelligence (BI): All right, let me
tell you this story. We have shops in three
different cities in Germany. In Suttgart we have shop
Berlin and Hamburg. And our three shops are
generating every business day a lot of raw data on sales, inventory levels, products,
staff costs, and so on. And now we have a
group of people that are the decision makers, like managers, HR, finance. And they have many questions
and decisions to make. So they might have
questions, for example, what happened, and another question about what will happen. Now if the managers try to find the answers
from the road data, they might find nothing
and no answers. Because the road data are
usually very complex and badly structured and they are
really hard to understand. And that's why
they're going to go and hire some data analysts, for example, in order to help them finding the answers
from the road data. The data analysts is
going to go and start analyzing the raw data
by doing some magic. For example, cleaning
up the data, connecting objects together, and aggregating the data
in different levels. And at the end, the result
will be communicated as, for example, spreadsheet
to the decision makers. In the other hand,
the managers can hire data scientists in
order to help them finding answers about
what's going to happen or uncover unknown
facts and insights. The data science is
going as well go and start analyzing
the raw data, but this time using
different methods like for example, data mining, machine learning or train model in order to find new
insights, new knowledge answers the questions.
At the end, the output is going to be
communicated as well to the managers as numbers
and spreadsheets. Now, both of the data scientist
and the data analysts did an amazing job working on the raw data and
analyzing those stuff. But the problem here is that the output might be hard
to understand and read, because those managers
are usually people that don't work directly
with the data every day. This could lead to a big gap between those managers
and the results. Now in order to bridge this gap and make
everything easier, we can use the power
of data visualizations and the results presented from the data scientist. And the data should be converted from
the pouring numbers and spreadsheets to visuals,
graphs and charts. The visual representations
of the data will just do the magic by making
everything clear and easy. And it's going to
bring very easily the wow effect once you are
presenting your results. So it's going to help the
managers to immediately find their answers and
they're going to start making decisions using the data. This process, we call it a business intelligence
or as a shortcut. B, I. All right, so now I
hope you have better understanding what is
business intelligence and next we will understand
why visualization is so powerful and what
is data visualization.
45. Tableau | The Power of Data Visualization: Now the question is why
visualization is so powerful. With the simple visual
communications, you can make a huge difference since the start of the
humanity thousands years ago. And early human use visuals
in order to tell a story. And until now, in
the modern age, the human still uses visuals
in order to tell any story. Because we humans, we
are visual creatures, we think in pictures
and individuals. If we see a tree, our brain can as story it as a
visual, as an image. In our brain statiste, that's 90% of the information transmitted to our
brain is visual. But if we read the word tree, our brain has failed to transform it to a visual
before storing it, which is waist lower. In fact, the human
brain processes visual 60,000 times
faster than a text. More facts about
our brain that we remember most of what we
see and interact with. It's proven that the human
remember only 10% of things we hear and 20%
about what we read. And it's also proven
that we remember about 80% of what we
see and interact with. That's why we have
the famous phrases of a picture is worth 1,000 words. And seeing is believing. Having all those facts, no wonder that in
digital channels the visual content is
taking over posts, tweets, articles, news
presentations, dashboards. You can find visuals everywhere. Now the question is, what
is data visualizations, or sometimes we call it Dataviz. Data visualizations
is the process of converting boring numbers and raw data into interesting
graphical elements like parts by three
blots and so on. So data visualizations
brings the data to life, makes you the master of storytelling of the insights
hidden within your numbers. So it's like an art of
converting highly complex, massive amount of datasets
into something very simple, something very easy to
understand and to interact with. Imagine yourself to be one of the managers and you
have two data analysts. One of them is
presenting the result in spreadsheet
filled with numbers, and the other data analyst is presenting the result
with visuals filled with the graphic representations
of the data and both are presenting
the same facts. Which report you will prefer? I would go with the
right one because the left one is just dry numbers pouring and unlikely
you will be able to spot any trends and patterns. The main benefit of
data visualizations is telling a story, arms you with tools
in order to make the right decision
at the right time. There are many other benefits, like seeing the big
picture, tracking trends, making smarter and
faster decisions, discovering unknown
facts, patterns, trends. And getting as well
more engagement from the end users by asking
more and better questions. All right, so with that,
we have learned what is data visualizations and why it is very powerful
and important. Next we will compare
Excel to tools like Tableau and why you need to
use Tableau instead of Excel.
46. Tableau | Tableau vs Excel: Over and over again I'm
asked the same question, why I should bother learning
and using Tableau or BI for data visualizations
if we have Excel. In this video, I'm going
to explain for you my six reasons why we should use a modern BI tool
like Tableau and BI and not use Excel for
data visualizations. And we start right now, there is around 1 billion users globally are using
Microsoft Excel. I worked in many companies
and I can tell you people are just addicted
to Excel. They love it. They use it for everything
as blanding tool, data entry, data analyses,
and data visualizations. The main problem here is that
the more a company grows, the more it generates data. And because everyone is
familiar with Excels, they're going to keep using
them in big data use cases. And they're going to face
really hard time managing those spreadsheets and dealing
with limitations in Excel. In these situations, it's
really time to switch to a modern BI tool or
data visualization tool like Tableau or Bar BI. Now let me show you how
BI is done with Excel. We usually have different
source systems and a data analyst that's
going to go and start exporting manually the data from those systems and
import them in Excel. And then some calculation
is going to be done and at the end a report
will be generated. The Axial files then will be access from different
business users. On the other hand, we can do BI with a modern
tool like Tableau. So what we're going to do,
we're going to connect Tableau directly to
those source systems. And the data analysts can start developing a report or
dashboards in Tableau. And at the end, the
business users will access Tableau in order
to see those dashboards. So far you can say, okay, both look really similar. So now let's dive in
in order to show you what is the real benefit
of having a modern BI, to like Tableau or RBI. And the limitations that we have in spreadsheets like Excel. The first benefit is automation. If you are using Excel and
we made some nice reports, it's time now to
update the data. And how we do that in Excel, we update data manually. So some employees have to
sit down every day and go through the process
of extracting data from those source systems, importing them in
Excel calculations. And at the end, prepare the
reports over and over again, which is very time consuming. But if you are working with the modern BI, two like Tableau, we can automate
this poring task by creating schedule to
refresh the data. For example, we can create a schedule in
Tableau every day at 07:00 Morning Tableau should automatically connect
to the data sources, pulse the data, and
prepare the reports. There is two benefits
of doing that. First, we eliminate
the human errors, which is very common
thing in Excel, and sometimes those mistakes can lead to wrong decisions
and to finance loss. And the second
benefit, of course, we no longer need employees
that is dedicated only for the pouring task of exporting and importing data
manually to Excel. Another benefit here is the capacity if we are
working with Excel and one of our source systems
start producing and generating massive
amounts of data. Here we have problem
in Excel because we can handle round only
1 million records. So our Excel file garner breaks, we're going to start
getting aero messages like the dataset is too large, what we usually do in Excel, we're going to go and start
splitting the main file into small multiple files in order to manage the
huge volume of data, which is really hard to manage. On the other hand, if you
are working with Tableau, we don't have to worry
about all those stuff. We have no problem in Tableau
because Tableau is made for big data use cases and can very easily handle
massive amounts of data. We might just change
the connection type from extract to live
in order to handle it. Another benefit is security. If you are working with Excel, it's really hard
to hack into Excel even if you are using password
protected spreadsheets. It still can easily
act nowadays. And the users are really used to share their
Excels in e mails, copy TSB, or store it
locally at their computers, which is not secure at all. All those staffs could
cost companies a lot if sensitive and confidential data is accessed by competitors. But if you are working with
modern BI, two like Tableau, it's going to provide us with superior security features like advanced access control data
security, network security. And plus, if you are
working with Tableau, we don't have to
export the data, we can just share the dashboards and reports
between employees, and only if we grant
them access rights. They can see the data. Another benefit is the
role level security. In many companies, they have a lot of
confidential sources. And they start to understand how important it is to
apply the principle need to know the
principles needs to know says a user shall only have access
to the information that their job
functions requires. That means we cannot go and
share all data to all users. We have to have some
data restrictions. For example, a sales
employee should not see all data like
manager and finance. Employees should not see all personal information
like HR and so on. That's means if you are
working with Excels, we have here again to split the main files into specific
reports, for specific rules. But on the other hand, most
of the modern BI tools, they offer a feature called
row level security, RLS. Row level security refers to
restricting the rows of data a certain users can see based on the policies that we define
using this technique. Going to enforce
the need to know principle and going
to make our life easier by just having one dashboard accessed by
different types of users. And then based on the rule, they're going to
see the data and the information that
their job requires. Another benefit is
reducing chaos. Let me tell you how we
usually work with Cel. A data science will start
exporting data from one source system and
you're going to make a report called
version one report. And then for other requirements, you're going to make
version two reports. And eventually
we're going to have a final report and we have another data analysts working
in different source system. And the same thing going to keep happening a few times
back and forth. And eventually we're
going to end up having different six
versions of the reports. If we scale this impact, you will notice that you
are slowly poisoning your business and the
end user is going to have to access different
versions of the reports. Now if we ask how old is
the data in our reports, we will get different answers. One version going to be ten
days ago, another 184.3 days. That's means we don't have single point of
truth for our data. That's why having modern tools
can help us to eliminate such a chaos and can help us building a single point
of truth for our data. One last benefit
that I would like to talk about is visuals. Although Excels offers
visualizations, but it is sometimes very
limited when we are producing complex visuals
in Excels as well. Creating visualizations
is very time consuming, including a lot of manual steps. And as well, those
visuals are going to be static and
not interactive. But on the other hand,
if we are using Tableau, everything is going to be
automated and super fast. We can create new reports and views very quickly by
just drag and drop. And they offer way more interactive and
cooler visuals than Excel. All right, the main
reasons why I prefer working with modern BI
tools like Tableau and Power BI and not Excel for data analysis and data
visualizations are automations, security, big data use cases,
and interactive visuals. It's not about Cel
versus Tableau, It's all about using
the right tool for the right use cases and
not to misuse a tool. Excel is a great tool that is used by billions of
people because it's very easy to use sheep
professional spreadsheet for data entry and
complex calculations. But when it comes to data analysis and data
visualizations, we have way better tool than Excel like Power BI and Tableau. And you can still
use them together. For example, you can do
your complex calculations in Excel and the
final result can be imported in Tableau
in order to do better visualizations and to get more insight
about the results. The thing is the world is
changing very fast and the companies are generating
massive amounts of data. So instead of using traditional
spreadsheets like Excel, we have to use more
powerful tools in business intelligence to help
us quickly find insights, trends, patterns in order to make faster and
better decisions. All right guys. So with that, you will no longer
have to rely on Il for data visualizations and
can start using BI tools. Next, I will show you quickly
the top three BI tools for data visualizations and what
is my favorite BI tool.
47. Tableau | Best 3 BI Tools: Now the question is, what are the best tools for
data visualizations? A leading research
company called Gartner published every year the Gartner Magic
Quadrants to show who are the leading product
in specific domain. And if you check the Magic
Quadrants for analytics and business intelligence platforms
for the last ten years, you can almost see
always the same leaders. We have tal, power, BI and click view since 2012. And I'm working with a lot
of data visualization tools. And I can say that
all those three tools are really great tools. They have the advantages
and disadvantages. But by just checking the
data visualization aspects, I can say that Tableau is here a winner because data
visualization in Tableau is a core concept and really the best tool for data scientists
and for big data. All right, so with
that, you have learned what are the three top BI tools. And you know by now
that Tableau is my favorite data
visualization tool. Our next step is to
introduce you to Tableau. We will cover what is Tableau, its history and its mission.
48. Tableau | What is Tableau?: The first question
is, what is Tableau? A quick answer could
be, Tableau Lbs. To convert this to this without any technical
or programming skills, Tableau converts complex
and boring raw numbers into beautiful
visuals and charts, which is really
easy to understand. The key features in
Tableau is interactivity, easy to build and to use,
and fast performance. We can call Tableau with many names like data
visualization tool, a business intelligence
or BI tool, or sometimes we call
it a reporting tool. Well, Tableau is all of them, but I choose to call the Tableau a data visualization
tool because data visualization is the
core concept of Tableau. Now let's have quick
history about Tableau. In 2003, Tableau was
founded by three guys, Pat Christian and Chris, as a result of computer
science projects at Stanford University. They focused on
visualization technique to analyze data
inside databases. And then in 2019, Tableau was acquired by Salesforce in a deal
worth over 15 billion. And for the last ten years, Tableau was named
as the leader in Gartner Magic Cordants for
business intelligence. Tableau has a clear
mission to help people to see and
understand their data. They really focus on keeping Tableau intuitive
and easy to use. That's why Tableau
does not require any technical or
programming skills in order to build amazing
dashboards and insights. That means the
target audience of Tableau is not only
for technical users, like IT, data analyst, data scientist, but also for all other non
technical users, like a business user, an end user, a
teacher, and so on. This aspect is a game changer, of changing the old mindset
of having only IT and technical people working with data and building
visualizations. But now we have modern data visualization tools
like Tableau, which opens the door for everybody to start
working with data. That's why tools like Tableau helps organizations
to be data driven. And now Tableau is widely used. You can find Tableau almost
in all organizations, industries, sectors,
in all departments. Because most of those
organizations want to empower their employees with tools like Tableau in order to make better, faster, and smarter
decisions using data. All right, so with that, I
hope you have now better understanding what is
Tableau and its mission. And next I will show
you my top four reasons why I think Tableau is a
leader in data visualization.
49. Tableau | Why Tableau is Powerfull?: Tableau is not the
only leader in business intelligence and
data visualization market. There are many other
tools that are available like PowerPI,
Click View and so on. But now if you ask me what
makes Tableau so special, why Tableau is so widely used, I would give you four reasons. The first reason is performance. The sources now are generating
massive amounts of data, and Tableau is designed
and optimized to handle huge volumes of data without embarking the
performance in the dashboards. And that's because
Tableau is using high performance in memory
data engine to help analyze large datasets
where the data can be stored inside
columns instead of rows, which can boost the
performance in dashboards. Table has no limitations
or whatever, to the number of data points
in the visualization. For example, on
this view we have over 1 million data points
without any problem. This allows us to analyze large datasets in
order to find trends. Patterns with great
performance and all other tools still enforce raw sized data
point limitations, which is not really helpful
for data analyzers. The second reason is quick and interactive
visualizations. Compared to the other
tools with Tableau, we can create rich and
beautiful visualizations in just few seconds. I'm going to show you
now quick example how to cluster my data and how to
calculate the forecast. In order to do such a
complex job in Tableau, we will just use drag and drop. So let's see how simple it is. All right, so we're going
to go to the orders. Take the sales, put it in the columns Profit and the rows. And take the order
ID's and the details. And I want to see all
my members over here. And now we go to
the analytics pan, and then double click
on the clusters. With that, I have very nice
fore clusters of my data. The next step, I will create
a forecast of my data. I'm going to take the order ID, put it on the columns. And then we're going
to take the sales. I would like to change
the visual two parts I have now here,
around five years. What we're going to
do, we're going to go to analytics and just click on the
forecast and that's it. I have a forecast of
two years of my sales. Now I'm just going to go and put them together in one dashboard. So I'm going to create
a new dashboard, drag and drop the clusters, drag and drop the forecasts. I'm going to link them
together with the filter. That's it. Now we
have both of them, and if I click around, I will have an
interactive dashboard for the forecast and
for the clusters. The third reason Tableau is user friendly, as you can see, we have done very
complex analysis with just Dragon Drop without
writing any code. And this is exactly
what Tableau wants. It's very intuitive
and user friendly, and this is the major
strings of Tableau. It just opens the door for all non technical users
to have a chance to work and play with data to solve their daily problems
without the need of IT. But on the other hand,
Tableau is integrated with programming languages
like Python and R, which opens another door for advanced data
visualizations which might be used from
data scientists. The last reason is community. If you are working with Tableau, well, you are not alone. You have a huge
Tableau community. In the community, we have around 2 million students and teachers. And in Tableau public we have around 5 million data
visualizations that are published. And there's around
200,000 questions and ideas that are shared
in Tableau forums. Having such a huge
community is a big blast. For any tool, It's
very important because while you are
working with data, you might face some problems
or you have questions. It's very important
that you have a place where you can go and ask your questions and get advice from other developers
all over the world. Not only that, you can
as well get inspired from the shared visualizations
from other developers. You can find the
important links about the Tableau community in the
video description below. All right, so my four
reasons why Tableau is one of the best tools for
data visualizations are, Tableau can handle
massive amounts of data, very suitable for
big data use cases. It offers beautiful, quick
interactive visualizations. Tableau is intuitive
and user friendly. No coding or technical
skills are required. And the last reason Tableau
community is very huge. One more thing that
I would like to add, that data visualizations
is really one skill that you have to master as a data
scientist or data analyst. And Tableau is an amazing
tool for data visualizations. That's why I highly recommend to learn or to get
familiar with Tableau. It's going to be like
a huge advantage for your career. All right guys. So with that, you
know my reasons why. I think Tableau is a leader
in data visualization. And with that, we have finished the first
chapter of Tableau where we have covered a lot of important terms of
data and Tableau. And in the next chapter, we will have an overview of
the Tableau product suites where I will introduce you to eight different
Tableau products.
50. Tableau | Section: Tableau Products: Table products in Tableau, we have eight different products and it's really important to understand them and understand the differences between them. So that's why I'm going
to go and give you a quick overview of all
eight Tableau products. And then we're going to go
and compare them side by side in order to understand the
differences between them. And add the end you can alone the decision making
process that I usually follow to choose the right product for
your requirements. So now let's start
with the first topic where we can have an overview of the development process and
products. So now let's go.
51. Tableau | Development Process: All right guys. In this chapter
I will introduce you to Tableau product
Suite to understand the differences between the
eight Tableau products. And we will start with the
Tableau development products. All right, if you
think Tableau is only one software,
then you are wrong. If you visit the home
page of Tableau, Tableau.com you will find many
different Tableau products like Tableau Stop Public
Server, Cloud Prep Reader. I can say at a starts, it might be confusing having
all those Tableau products, but don't worry about it. I'm going to explain
them one by one. So you can choose the
right combinations of Tableau products for you
or for your organizations. It's really important
to understand the differences between them, the functionalities
and the limitations of each Tableau products. And let's dive in. Tableau product suites contains
eight different products. We have Tableau Disktop, Tableau Public
Disktop Rep Server, Cloud Public, Cloud Reader,
and Tableau Mobile. All right, the first thing
to understand is that we can split those products into
two main categories, Developer tools
and Sharing tools. Tableau Developer Tools,
as the name implies, they are tools that are
going to help you to build data visualizations by creating
and designing dashboards, charts, reports, or to
do data preparations or data engineering by preparing
the data for data analysis. Under this category, we can
find three Tableau products. Tableau Disktop, Public
Disctop, and Tableau Prep. And now in the other category, we have the sharing tools. Those tools can help you to share and collaborate your work that you have done and created
using the developer tools. Under this category, we can
find five Tableau products. Tableau Server,
Tableau Cloud Public, Cloud Reader, and
Tableau mobile. All right, so now
first let's focus on the Tableau products under
the category Developer Tools. Now we can go and as well split the developer tools into two groups based
on their purposes. We have Data Visualzations
and Data Engineering. Underneath Data Visualzations, we find two Tableau products, Tableau Stop and
Tableau Public Stop. And underneath Data Engineering, we have only one
Tableau products and that's Tableau Prep. All right, so now
after we understood the main categories and the main purposes of
Tableau products, we will go now and talk about the development
process in Tableau. All right, so basically we have three very simple steps in the development
process in Tableau. The first step, we connect
our data to Tableau. Then in the next step, we start building our
data visualizations to do data analysis by creating
report chart and dashboards. And in the third step, we share our work
by publishing it. The two products to
do these three steps are Tableau Disktop and
Tableau Public Disktop. In many cases, the quality
of our data is bad, not ready for analysis. That's why we add one
more pre processing step to prepare our data before we
start building our visuals. And we can use for this step
the product Tableau prep. All right, so now let's
do deep dives and into Tableau developers
products one by one in order to understand
the key features and as well the limitations
for each one of them. All right, so with that,
we have an overview of the development
process and the products. And next we will have
a quick overview of the Tableau Desktop.
52. Tableau | Tableau Desktop: Tableodsctop is a software you download and
install at your PC. With Tablo Syctop, you can connect to many
different source types. There are over 90 data
connectors you can connect to Tableau server or to connect to files like Excel, Text Jason, or to Prem servers
like my SQL and Oracle. Or to cloud like Amazon, Google and Microsoft Azure. Once you connect
Tableau to your data, you can start building
your data visualizations. In Tableudyctop, you
will find many tools and functions to help
you creating charts, reports with just drag and drop. And then you can combine those different reports into
interactive dashboards. And after you've done building
your views and dashboards, then you have three
options to share your data by either publishing
them to Tableau server, Tableau Cloud, or to
Tableau Public Cloud. Or even you can store your
workbooks locally at your PC. All right, so Tableau Stop is the backbone product of Tableau. A Tablo developer,
you're going to spend 90% of your time
using this tool. Tabloid Distop is a
developer tool to build data visualizations where
you connect to your data, build dashboards, and
then publish them Oddly, Tableau Stop is not a free
tool like Power BI Disctop. In order to work
with Tabloidstop, you have to buy a license. I think they offer some
kind of trial phase, or if you are a
student you get like one free year. Don't
take my words. It's better to check
the current offering from Tableau in their home page. With Table Stop, you can connect over 90 different data sources. You can publish as well your work everywhere
to Tableau Server, Tableau Cloud, and
Tableau Public. Since Tablo Stop
requires a license, you don't have any
limitations or whatever on how many roads and data
you can store and process. Tableau Desktop is meant for data analysts, data scientists, PI developers who work professionally in companies
in data analytical projects. All right, so that's
was a quick overview of the Tableau Desktop. Next we will check the
Tableau Public Desktop.
53. Tableau | Tableau Public Desktop: Tableau Public is the free
version of Tableau Stop. It is very similar to it. It's a developer
tool in order to build and publish
data visualizations. And since it's free and
requires no license, it comes with fuel limitations. In Tableau Public, we have around ten data
connectors you can connect only to local
fights at your PC. Another limitation of that, you can store and process
only 15 million rows of your data and you can publish only to
Tableau public Cloud. That means you cannot
publish your work in Tableau server or
Tableau private Clouds. And the last limitation
is that you cannot store your workbooks
at your local PC. But here I have to be fair that the most important part
of that all functions and tools in order to
build visuals and dashboards are completely
available in Tableau Public, like Tableau Dctop, which makes really Tableau public
as a great alternative and tool for beginners
in order to practice and to learn Tableau before
they go and buy licenses. And to be honest,
that's why I decided to go with Tableau Public
in all my tutorials so that anyone can
follow and practice with me without having
you buying any licenses. All right, so with that, we have a quick overview of the
Tableau Public desktop and next we will check the data engineering tool, Tableau prep.
54. Tableau | Tableau Prep: Tableau Prep Builder
is a software you download and
install at your BC, and you can use it to prepare your data before you
start analyzing it. Same as Tableau Desktop, you can connect to many
different source types. There are over 90
data connectors, like Tableau server piles
on prem cloud and so on. Once you connect
Tableau to your data, you can start building data
flows where you have access to tools and functions to help you to
transform your data. For example, combining
data cleaning, filtering, aggregating, and all other art
of data engineering tasks, prepare your data for
data visualizations. And at the end of
your data flow, you can store the
new prepared data in three different places. Either as a file at
your local PC or publish it as a data source
in Tableau server or cloud. And the last option,
you can write the output directly
in databases. And after we are done
building the dataflows, then you can publish them in Tableau server or Tableau
online for automations. And in Table Prep you
have the option to store your dataflows
locally at your PC. All right, So Table Prep is a data engineering tool
to prepare our data, to get ready for analyzes. Sometimes the data that we are connecting to
Tableau Desktop has bad quality and we cannot use it immediately
in our dashboard. That's why we spend like hours
and hours of cleaning up, organizing, combining
preparing our data. And that could be
really time consuming. So for this situation, we could use Tableau Prib to
help us with this process. The Tableau Prib is
a developer tool for data engineering where
we connect to our data, build data flows, and
then publish them. And it's not free tool, it requires a license
in Tableau Prep, we have over 90 different
data connectors. The output of the data
flows could be stored locally at your PC or as
a Tableau data source or directly in the databases. And we can publish
the dataflow either to Tableau server or
to Tableau Cloud. Tableau prep is not
like Tableau Desktop. We don't have any free
version of Tableau prep, so there is no
Tableau public prep. All right, so that was a quick overview of the Tableau prep. And next we will compare all the three Tableau development
products side by side. And I will walk you through my decision making process to choose the right
product for you.
55. Tableau | Tableau Desktop vs Prep: All right, so now let's
go and have a summary of the three products
where we're going to compare them side by side. The main purpose
of Tablo Dicto and Public is to generate
data visualizations. But the main task of Tablo
Prep is for data engineering. Now if you are talking
about the costs, both Ctop and Prep
requires licenses, but Tablo Public is free to use. Now about the security
aspect of the data. Tablo Dctop and Prep are secure since you can publish
them to private servers. Tablo Public, you have to publish your work to
public platforms. Everyone can see your data, so you cannot secure your
data in Tableau Public. And the next point, data limits. Since public is free, it comes with the limitations
of 15 million rows. But Disktop and Prep, you will get no limitations. The next point is connectors. In both Disktop and Prep, you have over 90 different
data connectors like files, ABI, servers, Cloud and so on. Where in Tableau Public you
can connect only to files. And if we talk about the
live connections aspect, the only tool offers a live connections to your data sources
is Tableau Disctop. You cannot make live connections in Tableau Public
and in Tableau Prep. You have always to work
with extracted data. The next point is about
storing your files locally. Both Tableau Disktop and
Prep allows you to do that by storing your
work locally at your PC. But in Tableau Public
you cannot do that. Instead, you have always to publish your work to
Tableau Public Cloud. The last aspect is about
the target audience. Tableau Disctop is made for data scientists
and data analysts, but Tableau Public
is made for anybody who wants to work with
data visualizations, and Tableau prep is made
for data engineers. All right, so now with this, we have good overview of the three Tableau
products for development. And now comes the question, when to use which product. Now let me guide you in my decision making process using the following flu charts. First, we ask the question,
for which purpose. If we need products for data
engineering, then it's easy. We have only one Tableau product and that is Tableau Prep. Now if we need products
for data visualizations, then we can ask more questions. The next question, do
we need to connect to server ABI databases
or to cloud? If the answer is yes, then we have to use
Tableau Desktop. And if the answer is no, then we ask the next question. Can our data be public? If the answer is no, our
data is confidential, then we have to use
Tableau Desktop. But if the answer is yes, our data can be public, then we jump to
the next question. Do our data sources contain
more than 15 million rows? If yes, then we have to
choose Tableau Stop. But if the answer is no, our data sources have less
than 15 million rows, then we jump to
the last question. Do we need to have live
connections to our data sources? If the answer is
yes, then we have again to choose Tableau Desktop. But if the answer is no, then finally we can go
and use Tableau Public. All right, so if you follow those questions and this chart, you can easily decide when to
use which Tableau products. All right, so with
that, we have covered all the Tableau products
for development. And next we will start talking about the Tableau
products for sharing. So let's first understand
the sharing process.
56. Tableau | Sharing Process: All right, so in the
briefest tutorial, we split it Tableau products
into two main categories, Developers, Tools,
and Sharing Tools. Now we're going to focus on the second category,
the Sharing Tools, where we have Tableau Server, Cloud Public, Cloud Reader,
and Tableau mobile. And as the name implies, those products can
help us to share our reports and
dashboards with others. In the last tutorial, we have talked about
the four steps of Tableau development process. Now we're going to
do deep dive in the step number four where
we're going to talk about the different options
that we have in order to share our reports
and dashboards with others. If you want to
share your visuals with your colleagues
in your organization, then we have here a few options. First, you can install
Tableau server products on servers using the infrastructure
of your organization. And then you can
start publishing and sharing your
dashboard there. Then your colleagues can
either use their web browser, or they can use
Tableau mobile app on their smartphone or tablets to view and interact with your dashboards directly
from the server. The second option we have, we can install Tableau
server products on cloud service providers
like Amazon AWS, Microsoft Azure,
or Google Clouds. And then you can publish
your dashboard there. And the same thing here, users can use web browsers or Tableau mobile in order
to access your work. The third option we have, you can use Tableau
Private Cloud Service. Here, you don't have to install any Tableau server or anything. You will get everything
prepared from Tableau team. You can start immediately publishing your dashboard there, and your users can consume
it from Tableau Cloud. Now let's say you want
to share your dashboards with everyone in the
world and make it public. Then you can use
Tableau Public Cloud. You don't have to
install anything. You can immediately publish
your dashboard there. And users all around
the world can use their web browser to access
your dashboards and data. But they cannot use mobile app in order to access
Tableau public. And now to the last option that I really don't like to use. If you want to share your
reports to individual users, you can send them a Tableau
file with the format TX. Tableau packaged workbook which contains your data plus your
reports and dashboards. And then the users can
view this file using Tableau reader software
installed at their PC. All right, so with that,
we have an overview of the sharing process and the different options on
how to share your data. And next I will introduce you to three methods of
hosting Tableau.
57. Tableau | Hosting Tableau: On-Prem vs IaaS vs Saas: All right everyone. So now
in order to understand the real differences between Tableau server and
Tableau Cloud, we have to understand
the back end details and some basic concepts
about hosting servers. Let's go, let's say we are start up company
and we want to host our own Tableau application and build the entire
infrastructure. For that reason, there is a long list of tasks
that should be done. Of course, the
first thing that we need to do is to go and pile some hardwares and
configure them like servers that will
run the applications, each server needs
as well storage. So we have to provide additionally storage
infrastructure like some hard disk driver and SSDs servers needs to be as well connected
to the Internet. Therefore, we have to provide as well all the networking
infrastructure. Once we have all those staffs, then we have all
hardwares needed. The next thing that we need to do is that we're going to go and start installing and
configuring some softwares. Like we can install
an operating system, for example Windows or Linux, and many other middlewares. Once the operating
system is in place, then we have to install and configure Tableau
server application. Once we have all software and
hardware ready and running, It's finally now the time to
set up our Tableau projects. And we have to manage
the following tasks. We have to start adding
users to the Tableau server and map them to the correct
licenses we have as well, to curiate schedules
and tasks to refresh our data
inside Tableau server, and then we have to start
monitoring the Tableau jobs. All right, so now we come to the big question that
we have to answer. Who will manage what? The first option you have if you decide to manage
all these layers, that means we are talking
about the on premises model. So it's clear ownership, You manage everything
from top to bottom, hardware, the software,
and the project itself. But now, if you say, you know what, this is
too much to manage, We don't have the money to buy all those stuff and hardwares at the start and we don't have the time to take care of
them and maintain them. Then you will start
thinking about outsourcing the
hardwares where you're going to buy a service from cloud providers like
Microsoft Azure, Amazon, AWS, or Google Cloud. Know that they manage
the hardware and you manage both
software and projects. And this is what we call
infrastructure as a service, IS the first letter
of each word. But now if you say,
you know what, our IT team is very small, we don't even have the time to keep those softwares updated. Each time Tableau
makes a new release, we have to install a new
version of Tableau server, which is really wasting
our time and we are not able to focus on our
core business projects. We don't have the resources
to manage our own software. Then you start thinking about outsourcing the software layer. To do that, you can buy
a service from Tableau. It's called Tableau Clouds, where Tableau team going to
manage everything for you, both hardware and softwares. And this is what we call
software as a service as. Okay guys, so now
let's summarize and compare the three
hosting options. The first point is about
hosting set up on premises. You need Tableau
server installed in your organization servers
in as you need as well. Tableau server installed
in cloud service provider, for example Microsoft Azure, and in SAS, you just buy
Tableau cloud products. And now for the question,
who manage what? In on premises, you
manage everything, the hardware, software,
and your projects. And there is no
outsourcing in as you manage both software
and your projects. And the cloud service provider manage only the
hardware in Sass, you manage only your
business projects. And Tablo can manage both
hardware and software. So now let's check
the advantages and disadvantages of each service
model for the on premises. The good thing here is that you have full control of everything, the hardware and the software, and your data remains
behind your firewalls. This is very
important if you have critical or sensitive
information that should not stored outside
of the company's firewall. But the drawbacks here, you need a dedicated hardware
and software administrators to deal with the maintenance, patching, and many other tasks. It is very costly. At the
start of the projects, you have to pay a lot for the hardwares and the softwares, and it's not flexible. It's really hard to scale up or scale down your
hardwares as needed. Having all those stuff, generally you have less time for your business
projects. All right. So now let's move to the IS the first advantage it
gives you flexibility. You can scale up, scale
down the hardwares as the business needs and there is no upfront cost for
buying hardwares. But the downside of IS, is that you still need administrators to
manage your softwares, to do installations,
patchings of your softwares. And if you don't pay
attention for the cost, you might end up
paying big pills. Now let's move to As
the main advantage in SS is that it allows
your IT team to focus only on the core
business projects and allows you to implement
projects in very short time. And the other good thing is that your software will be
always up to date. Tableau team going
to deal with that. But the downside of SS
is loss of control. You will be at the
mercy of Tableau team. If anything bad happen,
like security problems, all your organization's
data might be compromised. And the other disadvantage is that you might have
bad performance or networking issues
connecting Tableau to your source systems. My advice here that you should avoid reinventing the wheel. Always take advantage
of services that do things not part
of your core business. Every hour you spend
patching an OS or installing update for your software or
replacing hardware, is an hour not spent enhancing and refining your
dashboards in Tableau. All right, so with that, we
have learned the differences between those three methods
of hosting Tableau. Next we will have an overview of the Tableau server
and Tableau Cloud.
58. Tableau | Tableau Server & Cloud: All right everyone. So
now we're going to do deep dives into Tableau
sharing products one by one in order to understand their key features and as well their limitations
for each one of them. And we start with Tableau
Server and Tableau Cloud. As Tableau developers
in organizations, we need to share our reports and dashboards with other
colleagues in our organization. So we need to put
those dashboards in a trusted environment or
platform in our organizations. And we usually have four requirements. The
first requirement, it should be safe and secure. We want to control who is accessing our data
and dashboard. Second, it should
be easy to scale. Third, it should
be robust that can handle huge amount
of users and data. And the last requirement, it should be powerful and
deliver high performance. No one wants slow
dashboards and reports. And now in order to build this trusted environment
with these requirements, we have two Tableau products, Tableau Server and
Tableau Cloud. And we have three hosting
options on premises As and SS. Don't worry about the terms, I'm going to explain them,
Tableau Server and Cloud, they are very similar. At the user interface level, you will not notice
any differences. But if you are checking
the back end level, there is a big
differences between them. So now first let's talk about the user interface level of Tableau server
and Table Cloud. Once you publish your dashboard to Tableau server or Cloud, you can share them by
providing links to the users across all departments
in your organization. And then the users, they can
access your dashboard using their web browser without installing any
software at their end. And if you give them access, they can start exploring your data in Tableau
server or cloud. You can manage your users by
adding and removing them. Give them specific
rules like admin, creators, viewers or explorer. You can manage your users as well by adding them to groups. Another important task
you can do in Tablocerver or Cloud is that you can
automate your tasks. For example, you can
create a refresh schedule to refresh your data
sources on regular basis, like once a day in
Tablo server and cloud. You can monitor the tasks and schedules to
check the status if the job failed or succeeded. And you can find many other statistics about the run time, the average and error
messages and so on. Not only the users can view the dashboards in
Tableau server or cloud, but also they can
create a new one. If you give the
users enough rights, they can even start creating
their own insights and views directly on their
web browser without having them to install
any Tablo desktop. It's something we
call self service PI. All right, so that
was a quick overview of the Tableau server and cloud. And next we will talk about the free option Tableau public.
59. Tableau | Tableau Public: All right everybody. So
now with this we have clear picture about Tableau
server and Tableau Cloud. So now let's talk about the other sharing
Tableau products. Tableau Public Cloud is a free cloud service
managed by Tableau team. Everyone in the world can share visualizations in this platform. If you publish your
dashboards in Tableau Public, everyone can access it, interact with it, and
even download it. Tableau Public is
like social media, you can edit your
profile and add your personal informations
in Tableau Public, you have a huge gallery of visas built by people all
around the world. It hosts currently
over 5 million visualizations in
Tableau Public. If you are browsing
and you found some interesting dashboard like this amazing
dashboard from Ajias, you can add it to
your favorites and then you can check
what other visits did Ajias created
and published to public and like any
other social media, if you like her content, you can go and follow her
to see her new updates. And if you are inspired
of one of her dashboards, you can go and install
the whole workbook to see how she did build these amazing dashboards
and see all details. With that, you are expanding the knowledge in
Tableau Developments. So using Tableau Public, you can get inspired from
others and you can get connected to other
Tableau developers from all around the world. And one more cool thing
about Tableau Public, if you are searching
for new job and you want to flex your data
visualization skills, you can publish a lot of work in Tableau Public and link it in your CV so that
the companies can see how skilled are
you in Tableau. So all these nice features
makes Tableau Public Cloud a very attractive platform
for sharing visualizations. But now if you are talking
about the security aspects, it is very limited. The only thing that
you can control, not allowed to download your visualizations or you can completely hide
it from others. But you don't have any
user access control like we have in Tableau
server or Cloud. Tableau public Cloud is a free cloud service
from Tableau. We host a lot of reports and dashboards built by people
all around the world. It's a great platform to get inspired by Tableau community, build connections to
other Tableau developers and share your skills. But since it's free, it comes
with field limitations. The total size available for each account is
only 10 gigabytes. Your dashboard and reports are not connected to
the source systems. That means you cannot automatically refresh your
data in Tableau Public. Always, you have
to do it manually. So you can open the reports, refresh the data, and again
publish it to Tableau Cloud. And the third limitation of Tableau Public is that
as the name implies, everyone in the world can
see and share your data. That means you cannot use it in organizations since you
cannot protect your data. All right, so that's all for now about the Tableau Public. Next we will cover the Tableau
reader and Tableau Mobile.
60. Tableau | Tableau Reader & Mobile: Tableu reader is a software you download and install at your BC. You can use it only to view
reports and dashboards, but you cannot use
Tableau reader to create any data visualizations
or even edited. As you can see, we don't have any tools or functions
to create charts. You can't even connect any data sources or
refresh your data. Tableau reader is very
old tool from Tableau. It was created in the
early days of Tableau in order to share content
piled using Tableau Stop. This was before even
Tableau server and Tableau Cloud made
available At that time, Tableau reader was the
only option you have in order to share dashboard and
report with other users. So how it works, you build
data visualizations using Tableau Stop and then you
send a file to someone else. Then they're going to use
Tableau Reader in order to view and interact with the
dashboard that you built. To summarize, Tableau
Reader is a pre tool. It is just to view
and interact with report and dashboard
built using Tableau Stop. You cannot create or edit
anything in Tableau Reader. You cannot refresh the data inside your dashboard
using Tableau Reader. Each time you have to
ask for a new copy. If you want to have fresh data and there is no
security features, password protections or login option, this
is a big problem. If the files lands
on the wrong hand, your organization data
could be exposed. Well, I don't recommend
at all using this tool. In organizations, the
risk is just too big. But if you want to take
the risk and to share your visuals with 123 persons, then use it, but
try to avoid it. Tableau mobile is a free
mobile app that you can download at your
smartphone or your tablet. You can use it to view
and interact with Tableau reports and dashboards published to Tableau
server and Clouds. So you can use it only
to view the reports. You cannot use it to create new reports or to
edit the reports. While Tableaumobile
is free to download, it requires a license to use, and it can only access Tableau
server and Tableau Cloud. So you cannot use it in order to access
Tableau Public and Tableau Moobile
can automatically cache your reports and
dashboards in memory. That means you can access
them even if you are offline. All right, so with that,
we have an overview of all five Tableau
sharing products. And next we will compare all the five Tableau
products side by side. And I will walk you through my decision making process to choose the right
products for you.
61. Tableau | Tableau Server vs Cloud vs Public vs Reader vs Mobile: All right everybody. So now
let's summarize and compare all Tableu sharing
products side by side. The first point about hosting Tableu server
can be hosted in your organizations or in cloud service providers
like Azure or Amazon. Both Tableau Cloud and Tableau Public Cloud are
hosted by Tableau team. Tableau reader will just be software installed at your PC. You can't even host it. Now if you are talking about
the cost for Tableau server, you have to pay for licenses, hardware and maintenance, but in Tableau Cloud you have only
to pay for the licenses. Tableau Public and Tableu
reader are free to use. Now if you check the
data security aspects, both Tableau server and Tableau
Cloud are highly secure. Table Public and
reader, they are not. Next point is about the storage limitations
in Tableau server. It really depends on
the server, disc space. In Tableau Cloud and reader
there is no limitations. But in Tableau Public Cloud, the total size available for each account is
only 10 gigabytes. The next point about
the connectors. Tableau server and Cloud
can be connected to different types of
sources like Cloud API, services, files,
databases, and so on. But Tableau Public, Cloud,
and Tableau readers, they cannot be connected directly to any of
your source systems. Let's jump to the next point, automation in Tableau
server and cloud. You can schedule tasks
to refresh your data inside your dashboards
automatically from the source systems. But the data inside
Tableau public cloud and reader cannot be refreshed. You have to do it manually. You have to republish it, or to resend the file. The next point about
Tableaumobile, you can connect
your smartphones or tablets only to Tableau
server or Tableau Cloud. Now to the last point, we can use Tableau
server and Cloud to share dashboards
inside organizations. Table Public is used to share dashboards to
the whole world, and Tableau Reader
is used to share dashboards directly
to individuals. All right, now with this, we have an overview of all
Tableau sharing products. Now the question is when
to use which products? Let me guide you in my
decision making process following this chart. All right. First we ask all questions about the limitations inside
Tableau Public Cloud. The first question,
can data be public? If the answer is yes, then
we ask the next question. Should the data be frequently refreshed in the
reports and dashboards? If the answer is no, then you can go and use
Tableau Public Cloud. But if the data should not be public and should be
refreshed automatically, then we have to think
about private hosting. Now the question now, do you
want to manage the hardware? If yes, then you can
use Tableau server on, on premises at
your organization. If you don't want to do that and you want to outsource it, then you ask the next question. Do you want to manage the
software on your own? But if the answer is yes, then you can use
again, Tableau server, but this time it's
going to be hosted in cloud service provider like Microsoft Azure in
a service model. But if the answer is no, you don't want to
manage the software by yourself and you want
to outsource it, then you can go and use Tableau
Cloud as a SAS service. As you can see, Tableau
reader is not in my decision making process since I don't
recommend it at all. Now if you combine this
flow chart with the one that we built previously
for developers tools, you will get my whole
decision making process that I usually use when I
start a new Tableau projects. So if somebody asked you when to use which
Tableau product, you can go through it and find the right combinations for
you or for your company. All those materials,
you can find it in my website. All
right everyone. So with that, we have covered all eight Tableau products and we understood the
differences between them. In the next chapter,
we will learn the Tableau architecture
to understand how Tableau internally works and what are the main
components of Tableau.
62. Tableau | Section: Tableau Architecture: Table architecture. Now
we're going to go and understand how Tableau
internally works, its components and
its limitations. So now we're going
to go and cover many important Tableau concepts, like what is live and
extract connections, what are the different
file types in Tableau? And then we can start drawing the Tableau desktop
architecture. And then we're going to jump to Tableau server in order to understand different scenarios like the published process, authentication process, and
accessing view process. After that, we're
going to go and complete the big picture by drawing the server
architecture and its components. And at the end, you're
going to cover as well the architecture of
the Tableau public. So now let's start with
the first concept, the live and extract data
connections. So now let's go.
63. Tableau | Live vs Extract: In this section, you will learn the Tableau architecture
to understand how Tableau internally works and what are the main
components of it. You will learn some
important concepts. And we will start
with the data source, connection types,
live and extract. Now we come to the most
important decision or questions that we're going
to make inside data source. Do you want to store an extra copy of your
data inside Tableau? Here we have two designs
for the data source. Either you're going to say, no, we don't need to
copy inside Tableau. The data should stay where
it is in the source systems. Then what can happen? Each visualizations needs
data, it's going to send squares directly to
the external database. Then the database
is going to send the results back to
your visualizations. The data comes always fresh from the sources directly
to your dashboards. This type of the connections, we call it a live connection
or you're going to say yes, let's have a copy of our
data inside Tableau. A snapshot or subset of
the data going to be copied from the external
database to Tableau. This copy, we call
it an extract. Now, each time our
visualization needs data, it's going to send queries, this time to the extract instead of the
external database. And then the extract
going to return the results back to
your visualizations. Since the extract is inside Tableau and very close
to the visualizations, we will get great response time and very fast performance. This type of connection, we call it an
extract connection. All right, now the question is, which connection type should
I use in my data sources? The typical answer
for this question is, well, it depends. Because here we have a trade off between performance
and data freshness. For example, if for
you the performance is way more important
than the data freshness, then you have to go
with the extract. Since the data going
to be stored inside Tableau in memory using the
column store technique, you will get just
great performance. But if you say you know what, the data freshness for me is more important
than the performance, Then you have to go with
the live connections in your data sources because you will always get the fresh data directly from the sources
in your dashboards. All right, so that's
was a quick overview of the two data type connections in Tableau Live and Extract. And next we will learn
the different types of files that you can
generate in Tableau.
64. Tableau | Tableau File Types: All right, so now
if you want to send Tableau files directly
to the users, we have to ask the question, which type of files
we're going to send? Because in Tableau, so we can
generate not only one file, we can generate five different
types of files in Tableau. So now we're going
to have like quick overview of those types of files to understand them and
to know when to use them. All right. As we learned, the Tableau workbook
contains three things. The extract, the data source,
and the visualizations. There is a file type for each. Combinations depend on
your requirements example. If you want to share
only your data without anything
else, no data source, no visualizations, then you can send an extract as
a hyper format. But now if you say,
you know what, I've done a lot of work
in the data source. I built a data model,
I renamed stuff, I did aggregations, I created
a lot of new columns. So I would like to share
that with my team, with my colleagues, and I'm not allowed to share
my data with them. In this situation,
you say, okay, I'm going to share
the data source with my colleagues and we call it Tableau Data Source
TDS without data. Or you might be in
other situations where you say, you know what? My colleagues don't have an
access to the source systems. We cannot use the
live connection and you don't mind sharing
your data as well. Now you can send them a package of an extract
and the data source. The file type here called Tableau package
Data source DDS x. This type of file contains both of your data and
your data source. We might be in another situation
where our colleagues or users are interested as
well in the visualizations. We can send them a file with the visualizations
and the data source. Here again, we have
the same situation. You decide whether
you're going to send with it data or not. If you don't want to
send the data inside it, you can send a file called
Tableau workbook B. And the last scenario, I think you already guessed, if you want to send everything, the whole package, the
extract, the data source, and your visualizations, then you can go and
send your colleagues a Tableau format called Tableau
packaged workbook TB X. All right, so as you can see, Tableau did different
types of files for different purposes depend on the situation or the
scenario that you have? You can share your work
with your colleagues. All right, so now
generally speaking we have two different
types of workbooks. A workbook with data
using extract connection, and another book without data using live
connection in one hand, in the workbook with data, you can send three
different types of files. You can send only the data
using hyper format or send the whole dataset with
the data using DSX format. Or send the whole package
with the format BX. In the other hand, with
the workbook without data, you can send only two files. Dataset without data
DS or the workbook X. Now you might have the
question and you say, okay, which Tableau products should I use in order to open
these Tableau files? Well, we have three
Tableau products. Tableau Tableau Public,
and Tableau Reader. With the Tableau disctob,
you can open everything. You can open all these different Tableau formats and files. But with the Tableau
reader and public, you can open only the Tableau
packaged workbook TX. Since Tableau reader and
Tableau public cannot connect directly to the data
sources and they cannot use the live connections. All right, one more thing to understand about Tableau
workbook is that Tableau uses two different types of data to store the workbook. The first one is the
metadata information, It will be stored in XML files. Metadata is data
about your data. It describes your data. It contains all informations on what have you done
in the workbooks. Anything you click, Dragon, Rob, or do while working
with Tableau Desktop will be reflected in some
way in the meta data. You can find information, for example like column names, data type, data
model, and so on. The second type is the data
itself, the actual data. If you load data inside Tableau, Tableau can store it in
a format of hyberfile, where the data going
to be stored in column store methods in
the memory of Tableau. It is like special formats
for fast data retrieval. All right everyone.
So with that, we have learned the purpose of the different types of files in Tableau and when to use them. And next we will do deep dive in the Tableau architecture to understand the
desktop components.
65. Tableau | Tableau Architecture: Desktop Components: All right, if you understand
the Tableau architectures and how the components are
connected to each other's, everything going to make sense for you as you are working with Tableau and as well it's going to makes you a
better Tableau developer. I will be sketching
the concepts in order to make it easier
for you to understand. So let's go. The Tableau architectures contains
four different layers. Source layer, the disto layer, server layer, and
the consumer layer. We will start
unboxing each layer one by one to understand
their components. And we're going
to work with this architecture from left to right. So we will start by
the source layer and we're going to enda
by the consumer layer. All right, so now we
have the source layer. The source layer is outside of Tableau and it contains
the source of our data. Our data could be in databases
like Mysql or Oracle, Or the data could be in
files like Excel and Jason. Or even in the
cloud like Amazon, AWS or Microsoft Azure, or even in PI's, our data could be everywhere. All right, so now back
to the big picture. Let's jump to the next layer. We're going to unpack
the disctop layer. The first component in Tableau Desktop is the data source. Before you start building
your visualizations, you must set up the data source. The first thing that
we're going to do inside the data source is to
connect Tableau to our data. Tableau offers around 90
different data connectors, so we can connect Tableau
almost to anything. Once you build the
connection between Tableau and your source of data, the access information
is going to be stored inside the data source. For example, the bath of the
file location of servers, username, passwords, or
access tokens, and so on. All these information
is going to be stored inside
the data source. All right, so the two types
of data connections in data sources are extract
and live connections. Now we connected to data, we decided which type
of the connection. The next thing that
we have to do in the data source is to start
building our data model. And we can do that by
combining tables together, using relationships,
joins, and union. And you can do
many other stuffs, like setting the right data
types, doing aggregations, renaming tables and columns, creating new calculations
and filters and all right. Now to summarize, the
data source component in Tableau contains the
following informations. We have the data connectors to connect Tableau to our data. We have the access informations, where the locations
of our sources going to be stored as well. We can decide whether
we're going to load an extra copy of our
data inside Tableau. We call it an
extract connection, or we're going to leave it as live connections in
the data sources. The last thing we
have the data model inside data sources where we can combine tables together and do aggregations or we can
do some other custom. All right, so once we are done with the set up of
the data source, we have the connection
whether it's extract or live. We have our data model
and everything is ready. Now we're going to go and start building our visualizations. And Tableau organizes the
visualizations in three levels. The first one is the worksheets. So we can use the
data available in our data sources to build a
single view, only one visual. It could be a bar chart, a pie chart, or a table view. And as you can see,
each worksheet is connected directly
to a data source. But in Tableau, you can
build a worksheet from two different data
sources by using very powerful combining
methods called data. This is very unique
feature in Tableau. You cannot find it
in any other tools where the data in one visual can come from different sources. Once we have these
different worksheets, we can go to the
next level where we start combining
these worksheets into one dashboards to show the different visuals
in only one view. But keep in mind, if you want to do any changes in the visuals, you have to go back to the worksheets and do
the adjustment there. Now we come to the last level, we have the stories. As you know, the
main goal of doing data visualizations
is to tell a story. So you can build like a sequence of worksheets
or dashboards, works together in order to tell the users story
based on your data. All right, now you might ask me which visualization level
is the right one for you? Well, if you have
only one visual, then go with the worksheet. But if you want to build
some QBI to monitor process, then build a dashboard. If you want to present your data and tell a story from it, then go and build a story. All right, now we have
in Tableau Desktop both of the data sources
and the visualizations, and these two components
are contained in something called
a Tableau workbook. Now the question is,
after you've done building your data sources
and visualizations, what can you do workbook? Well, you can share it with your colleagues in your
team or departments. And there is two
ways to do that. Either you're going
to go and send a Tableau file
directly to the users, or you're going
to go and publish the workbook to a
Tableau server or cloud. And from there your users and your team can access
your workbook. All right, the big picture,
the Tableau architecture. Let's talk about the layer on the right side, the
consumer layer. There is different ways to consume Tableau visualizations, depends on the user's clients and on the tasks the users do. We start with a very
small group of users that they might use
Tableau reader to view and interact with Tableau
visualization and they usually don't want to edit
or create something new for this group of users. We're going to send
them a Tableau file. As we learned,
they're going to need a Tableau packaged
workbook, WPX. We might have another
group of users, usually they are your
team colleagues. They want to build analyzes
on top of your work. They're going to use Table
Desktop to do that for them. We can send any kind
of Tableau files. Depends on their requirements
and their tasks. And now we have a big group of users or consumers that they can access Tableau
server or cloud to view and interact
with Tableau visuals. They can use their web browsers
like Google Chrome and Firefox to access the
content of Tableau server. And from there they
can view, interact, and even edit the visualizations if they have enough permissions. Or they can use Tableau mobile
app on the smartphones or tablets to view and interact
with your workbooks. But they cannot use it in order to edit a Tableau Visualization. For this group of users, you will not send
them any files. First, you have to publish
your work to the server. And here we have two options. Either you're going to
publish only the data source, or you can publish
the whole workbook to the Tableau server or cloud. After that, you're
going to share the link of your workbooks to the users. Now to the last group of users
that's worth mentioning, they are the static users. You can always export your
data and visuals from Tableau Desktop and
send it directly to the users as a BDF or Excel. So of course it's static and they cannot
interact with it. All right, so so far in
the table architecture, we talked about
the source layer. We did deep dive in the tabloid
stop and its components and we understood
the different type of consumers and the clients. And in the next step, we will start talking about the Tableau server architecture. But first, in order to make
it easier to understand, we will go through three
different scenarios. And we will start with
the published process.
66. Tableau | Publish Process: All right, previously
we start sketching the Tableau architecture where we learned about
the source layer, the desktop layer, and
the consumer layer. Now we're going to unpack
the server layer in Tableau architecture in order to better understand Tableau
server components. I'm going to walk you through three scenarios from
the user point of view, what's going to
happen exactly in Tableau server once we publish a workbook or when we log into the server and
access a workbook. Let's go. Let's say that you want to publish
a Tableau workbook with an extract. What's
going to happen? Tableau Desktop going to request the server to upload
the workbook Bx. And the first component
in Tableau server that can receive the request
is the gateway. The gateway knows how to forward the request to the right
server components. In this situation, the
right component to process the publishing is
the application server. The gateway going to
forward the request to it. As we learned, the
Tableau workbook holds two different
types of information. The metadata stored in the
Xmil files and the data itself stored in Hyper
files in Tableau server. Those two different
types of files going to be stored in two
different places. Application server going
to send the XML file to be stored in the server
component called repository, and the hyberfile
going to be stored in another component
called the file store. What we have learned so far, the gateway is responsible to forward the request to
the right component. The application
server is the one that can handle the
published process. The repostery going to
store the XML files, the meta data of the workbook, and the actual data, the hyber going to be stored
inside the file store. All right, so that's
all for this scenario. Next we will start talking about the authentication workflow
in Tableau server.
67. Tableau | Authentication Process: All right, so now
our workbook and our data are published
to Tableau server. It's time now for our
users to log into the Tableau server and start interacting
with our dashboards. So let's see how
this going to work. Let's say your manager
is Michael Scott. And Michael wants to check your sales dashboards
in Tableau server. And I'm going to do it, I need a username and
I have a great one. Once Michael gives
these informations, a request going to be sent to
the server as HTTB request. The first thing that it's
going to head is the gateway. The gateways knows that
the application server is the right component to handle
the authentication process, so the gateway going
to forward it to it. And then the application
server going to ask the repository to check
if the credentials, user name and password
are correct and if Michael has permission
to access our server. And then the repostoryinga
check and if everything matches and Michael is allowed to access our server, it will respond back to the application server
and going to say, yeah, we knew the guy, he is in our records. Then the application server
going to start building the server UI and send
it back to the gateway. And then the gateway
going to send it back to Michael browser. Now he is inside
our Tableau server. So what we have just learned
from this process, again, the Gateway is responsible for forwarding the request
to the right component. The application server
is the one that going to handles the
authentication process. The reposterre going to store the user credentials and if
the users have an access and permissions to our server
and the application server is the one that renders the
web interface of the server. All right, so that's
all for this process. Next we will talk
about what happens in Tableau once we access a
workbook to view the data.
68. Tableau | Access View Process: All right, so now Michael is
inside our Tableau server and he's going to
start browsing and searching for your
sales dashboard. And once you find it,
he's going to click on it and try to access
your dashboard. So now let's see what's going to happen in Tableau server. As usual, the HTTB requests for accessing going to be generated
and sent to the server. And we know by now
that the gateway going to receive the request and start forwarding it to the right component
application server. Then the application
server going to start render the Chrome around the Z, all those icons and images that are not inside
the dashboard itself. And then the application
server going to say, okay, now we are talking
about visualizations. This is completely
out of my leak. We have to forward this request to the master, to the brain. It is the viscuL server. It is the one that deals
with visualizations. From here, the
ViscueLgn take over. I'm going to say, okay, first thing first, let's
check if this guy, Michael, is allowed to see
the sales dashboard, the Viscuelgn ask
the repos story. In the repos story, there is
a list of users and reports. So it's going to search
there to find any. If yes, then it's going
to send back, yeah, Michael is a boss and he's allowed to see the
sales dashboard. And now the viscuL gonna say, all right, now we need data. So first we need the meta
data of the dashboard. And as you know, after
we publish the workbook, the meta data is going to be
stored inside the repostory, The Visculgna request
from the repostory. One more thing is to send the
XML file of the dashboard. The repostory then's going
to send back the XML to the ViscuL server and the server will start
building the dashboard. All right, so now the
Viscul gonna say, okay, now we have the dashboard. But the problem is it is empty. We need the data to fill it. And it's better to ask our data specialist
and the data server. The data server is the one that knows everything
about the data. It's going to say, all
right, for this dashboard, part of the data, we have it already inside Tableau server. But the other part is
sadly outside of Tableau. To get the data inside Tableau
server from the extract, the data server is going to send the query request
to the D engine. And the data engine
knows how to query and extract the needed data
from the file store. The data engine is going to get the data from the file store and it's going to send it
back to the data server. And now we come
to the part where the data is living outside
of Tableau server. Here, the data server is
going to act as a proxy. We're going to use
the data connectors to connect to the
external databases. Once the connection
is established, it's going to send a query that matches the language that
the database speaks. And then the
database is going to return the needed
data as raw table. Now once we have
all the needed data inside the data server, it's going to combine it and
do another security check. The data server going to check, is Michael allowed to see all data or should
we filter the data? The data saver going
to filter the data depends on the data security
setup that you have made. And then it's going to send the raw data back to
the ViscuLserver. Now once ViscuLserver has the
raw data for the dashboard, it's going to do now
the magic by turning all those numbers and raw
data into images and visuals, and it's going to put
it inside the workbook. So now finally, the ViscuL
has everything it needs. The sales dashboard is
complete and ready. The ViscuL going to send
it back to the gateway. And the Gateway going to send it back to the web
browser of Michael. Michael can start
interacting with the dashboard now. Will hm. Does Michael have any idea what to do with the
sales dashboard? I declare bankruptcy. All right. I know there was a lot of stuff going around in this scenario, but we have covered most of the Tableau server components. So let's have a summary and understand what we
have learned so far. As usual, the gateway
is responsible to forward the request
to the right component. The application server is not responsible for the
visualization process, but the viscuL server is the one that is responsible of
building the visualizations. The repository can store information about
the permissions and security which users are allowed to access,
which dashboard. And the data server is
going to manage both of the extract and
live data sources. And the data engine
is responsible for retrieving the data from
the extract inside Tableau. And the data connector
is going to help the data server to connect
to the external sources. And the viscuL server
does the magic of transforming the
raw data into visuals. All right, so far with
those three scenarios, we covered the most important component
of Tableau server. Now we're going to go and
put all pieces together into the Tableau
Architecture and start explaining them one
by one. Let's go.
69. Tableau | Tableau Server Architecture: In this video, you will learn about the Tableau
server architecture. And then we're going
to do a deep dive into each server component of the architecture to understand how it works and what it does. And we start right now, the server layer contain
mainly of three stuff, two interfaces left and right. In the middle, we have a
bunch of server components. The left interface is
the data connectors. They're going to connect
the external source systems to Tableau server components. In the right side,
we have the gateway. It's going to receive requests
from different clients, going to connect it to
Tableau server components. All right, so now
let's go more in details about the
gate component. In one hand, we have requests come from different clients, like a login request
from web browser, or a published requests
from Tableau Desktop. And in the other hand, we have different Tableau server
components like the app server, ViscuL server and so on. And the gateway
is going to be in the middle that's
knows how to forward the requests from
different clients to the right server components. And the other task
of the gateway is balancing stuff around. Let's say that you
are working in multi node environments
where you have two nodes. When the gateway received
the first request, it's going to forward it
to the node number one. Both nodes are free. But now, if the gateway
gets a second request, it's going to say,
oh, node one is full. Let's process this request in node number two since
it's free and so on. All right, so the gateway
in Tableau server is like a distributor that
knows everything. You know someone like that. Let's just say I
know a guy who knows a guy who knows another guy. So the Gateway has two tasks. First, it routes the client requests to
the right component. And second, it does
load balancing if you are running Tableau server
in distributed environment. All right, so now we're
going to start talking about those Tableau components. In the middle, in Tableauver there is like
different arts of components. We have servers, we have
engines and storages. And we're going to
start with the servers. As you learned in
Tableau server, there is like
different processes. The login process, populis, accessing, workbook, and so on. And in Tableau
server, they designed different servers for
different processes. Let's start now with
the application server. The application server is responsible for
different processes. Like, as we learned,
a user login request is going to be forwarded
to the application server. Then the application server
is going to check with the repository or an
active directory, depend on your configurations
to find out if the user is allowed to
access the server or not. And the other process
the application server handles published process where the application server going to get the published request and it's going to split the
workbook into two files. The XML file to be stored in the repository and the hyberfile to be stored in the file store. One more task for the
application server is to render the
server interface. All those little stuff
that you find in Tableau server like icons, images, projects minus it. Is the application server
who render those stuff. The application server
is responsible for different processes like the authentication and
authorization process, the published process, and
rendering the server I. But one process that the
application server will never do is the
visualization process. Or now we're going to
jump to the next server. We have the Viscul server. This one's going
to be interesting. All right, so
previously we talked about the power of visuals and how human brain transform
text into visuals and images. The ViscuL is like our brain. It can add the
magic by converting numbers and texts into
visual and images. Viscul stands for Visual
Query Language for databases. The founders of Tableau, Crest and Pat, they did
invent this language. Let's say that you drag and
drop something in Tableau. The ViscuL gonna
convert this action to an SQL query and then send it to the data
server to get the data. Then the data server
going to send the results back to the ViscuL as raw data. Now ViscuL going to do
the magic by converting those raw data into visuals and images presented
at your clients. All right, so the
viscuL is the brain. It is very important Tableau
component and responsible of the visualization process
mainly. It does two things. It's going to generate queries from user action and it's going to convert and transform the raw data into
visuals and images. All right everyone, So now we're going to talk about
the third one. We have the data server. The data server is the one that knows everything
about the data. It knows where to find the data, how to connect to,
how to speak to it. The first task of the
data server is to manage both extract
and live data sources. If the data is inside Tableau, it can send query requests
to the data engine. But if the data is
outside Tableau, it can use the
data connectors to send query requests to
the external sources. And the data server knows
how to speak to the sources. It acts like a proxy
to the data sources, can speak many different
database languages so that it sends query requests in a language that the
database understands. We have another task for the data server is to
handle the data security. It checks if a
user is allowed to see the data and do
filtering if needed, and the data server
manages as well. Driver deployment. So
the data server is the central data
management component in Tableau server and the one that knows how to get data
from the sources. All right, so now let's
jump to the next component. We have the data engine. If we decide to store our data inside Tableau as an extract, then the data engine is going to be the one dealing with it. Different components can send requests to the data engine. Like, for example, the data
engine can receive a request from application server
to publish a new extract. Then the data engine
can execute and create operation to create a new extract and
store data inside it. The data engine can receive
as well equ request from the data server asking for
data. What can happen here? The data engine going to
find the correct extract. It's going to connect
to the hard driver and then it pulls the
needed extract from it. And at the end, the data going to be sent
back to the server. And finally, the data engine
can receive a request from the backgrounder to update
the content of an extract. The data engine can execute
an update operation by opening the extract and updating its content
with the new data. The data engine in Tableau is like any other database engine. It does different operations. Like it queries the data, it perform insert and
update operations. It creates new extracts, but only for the data
inside Tableau server. Inside the extracts. Okay, the next component
is the repostory. As you might already noticed, the repostory was involved
in every table process. So let's talk about it. The repostory stores many
different types of data. Like, for example, it can store the workbooks that's we
published to the server, but only the metadata part, not the data itself. The XML files from the workbooks can be stored inside
the repostory. In the repostry we find
as well the usage data. It's data that's going to
help you to understand the performance and the
traffic about your project. Like for example, you can find the total number of active
users inside Tableau server. What total view counts by day, and you can find out the most used data
sources in your project. Another type of data
that you can find inside the repostery is the
security information. For example, which users
are allowed to access your content or which users are allowed to access
our Tableau server. All right, so as you can
see in the repostery, there is different
types of data and it contains as well huge amounts
of data in Tableau server. But it's very important
to understand that is the data inside our dashboards and reports not stored inside a repository. We have many other
Tableouserver components that's worth mentioning. Like for example,
the cache server, it stores almost
everything like images, icons, results of queries,
dashboards and so on. So if you start a dashboard that is already accessed before, the data going to be pulled
from the cache server. Another component is
the Backgrounder. In Tableouserver, you
can create a schedule to refresh the data
inside your extract. And the task of the backgrounder is to check this schedule each 10 seconds and then trigger the process of refreshing the
extract if the time comes. And the last component
that I would like to mention here is the
search and browse. The users of Tableouserver, they can search for content. This component is
responsible for searching inside the repostery and return
the results to the users. All right, if one
finally we have the last puzzle, the
sever components. If we put it in
the architecture, we will get the
whole big picture of Tableau architecture. Now let's go and do
very quick summary. The source layer, it is the one that is
outside Tableau and contains our data and it could be anywhere like
databases or files. In the disktope layer,
the developers can start connecting Tableau Disktop
to the data sources. With either copying the
data inside Tableau using an extract connection or with the live connections
to the sources. The going to start building visualizations using worksheets,
dashboards and stories. And both of the data source
and the visualizations. We call it a workbook
and we can either send it as a file or
share it to the server. The server layer going to host our workbooks and we can find many components like
the data connectors to connect our sources
to the Tableau server. And the gateway to connect the client requests to
the Tableau server. And we have the
application server responsible for the logging
and publishing processes, the viscuL server responsible for the visualization process, and the data server is the one responsible for the
data management. We have another component like the data engine that's going
to handle the extracts. In Tableau server, we have three places where
the data going to be. We have the repostery that
contains many different data, like the XML of the workbooks
and the security objects. But not the data itself, because our data
going to be stored inside the file
store as an extract. And we have the cache
server that contains many different types of data to increase the Tableau
performance. And the last one is
the consumer layer. Here we found the different
groups of users and clients, like the Tableau
readers that needs only the TWbX files
directly from the Tableau developers and
another group of users that they're going to use Tableau
to develop new views. And we have the static
readers that's going to receive files
like BDF and Excel. And then we have a big group of users that's going to
access Tableau server using either Web
or Tableau mobile to interact with the
populist workbook. All right everyone, one more thing that I
would like to show you is this amazing
dashboard from Tableau team. It's going to show you the
different component inside Tableau server and how they're going to
interact to do a task. For example, if we go to the
workflow or the process, we can select, for
example, access to view. And then we're going
to select whether it's like an published
extract or live. Over here we have like slider. If you drag it to the end, you're going to see
how the components are interacting with each
others to do the tasks. And on the right side you will see description for each step. And this is really great way to learn how Tableau server works. I learned from this a
lot for this tutorial, so make sure to check
that if you want to see more details about other
processes in Tableau server. I'm going to leave the link
in the tutorial materials. All right guys,
so that's all for the Tableau server architecture
and its components. Next we will learn the
Tableau Public architecture and what are the limitations
of Tableau Public.
70. Tableau | Tableau Public Architecture: Let's start with the
source of our data. In Tableau Public,
you can only connect files like CSV Jason, Microsoft Access,
and Google Sheets. The next component is
Tableau Public Disktob. It is free version
of Tableau Disktob. It's software that you can download and install at your PC. So here we start by
connecting Tableau public to our files by
creating a data source. In the data source, we have
only one type of connection. It is the extract. The data should be
copied from our files to be loaded inside
Tableau Public Disktop. There is no live
connection option. And then after that,
we're going to start building our
visualizations, or we call it viss. Now once we are done
building the views and the dashboards using
Tableau Public Disctop, we have here only one
option to share it. That is to share
the whole workbook, your data, and the vises
to Tableau Public. Tableau Public is a free
platform hosted from Tableau team to share the visualizations
from the whole world. Once our viss are published
to Tableau public, D can be now consumed from
users all around the world. And here we have few options. The users can use
the web browsers to view and interact with
your visualizations, or users can download
the whole workbook, your data, and devises in different formats
like Tableau file, WPX or Il, BDF,
images and so on. The last option of
consuming your vises can be embedded into your
websites and blogs. Okay, now since Tableau
Public is free, it comes with few limitations. At the source level, we can connect Tableau
Public only to files. The data connectors
are very limited, and we cannot connect, for example, to servers. And in the next level, at the public desktop
level, there is limitation. In the data source, we have only one type of connections
and that is the extract. So we cannot have
a live connections to the sources and
the workbook itself, it can contains only
maximum 15 million rows and we cannot save the workbook
locally at our commuter. The only option to share it is to publish it to
the Tableau public. But there is like a
work around for that. I'm going to show that
in the next tutorial. All right, so now let's move to the sharing level
to Tableau public. Here we have as well,
few limitations. For example, the
total available size for each account is
only ten gigabyte. And there is no way to refresh
your data automatically. Each time you need new data, you have to manually republish the workbook with new data. And the third one, it's
going to be public, so there is no way
to make it like a private and to share it
with only a few people. You have always to publish
it to the whole word. Now let's move to
the final level. We have the consumers. The only limitation here
is that you cannot use Tableau Mobile to access and interact with
the visualizations. All right everyone,
I decided to use Tableau Public in this Tableau
course since it's free. And all of you can
follow me with the examples without having you to pay for extra licenses. And the limitations that
we have in Tableau Public, they are not really relevant
for the learning process. So the main features of Tableau, the data visualizations that
we have in Tableau Desktop, they are all
available as well in Tableau public without
any limitations, so don't worry about it. All right everyone.
So with that, we have learned the Tableau architecture and its components, and we learned how
Tableau internally works. And with that, we have covered the theory parts of Tableau. And in the next section, we will start preparing
your environment so you can practice Tableau with me during the course.
So let's jump in.
71. Tableau | Section: Prepare Your Pc: We can prepare your Tableau
training environment. In order to learn Tableau, you should not only
watch the videos, you have to practice with me. And that's why now we're
going to go and prepare your environment in
order to work with me. And of course, don't worry about it. Everything is for free. So we'll start by downloading
and installing Tableau, then we're going
to go and create a Tableau public account. And after that, in order to make sure that
everything is working, we're going to go and create
our first visualizations. And then we're going
to go and publish it to your Tableau public account. And at the end, what
we're going to do, maybe it's your first
time starting Tableau, that's why I'm going to take you a quick tour of the
Tableau interface. So now let's start
by the first step by downloading and installing
Tableau. So now let's go.
72. Tableau | Download & Install Tableau: All right, let's start
with the first step. We're going to go and download
Tableau, public Disktop. In order to do that, we're
going to go to the website public Tableau.com I'm going to leave the link
in the description. From there, we're going
to find the menu Creates, and then we can click on that. Then we have download Tableau
Disto Public Edition. Let's click on that. And
then we're going to go to the middle and click
on Doable Public. Now before the download starts, we have to fill out this
registration forum. This is not for creating public account, it's
just something, before download starts, we're going to give
the first name, last name, e mail, and country. And then we're going to
click download the app. And then the download going to start is just 500 megabytes, so it should not
take a long time. Now we have the
download is done. Let's click on the
execution file to start the
installation process. Okay, At the start
of the installation, we are at the welcome page here. As usual, we have
to read and accept the terms, so you
have to do that. And here we have second box. You can click on it if
you don't want to send the product usage
data to Tableau team. It's like cookies. I don't mind. I'm just
going to leave it. So we click now Install. Once you do that, the
installation going to start. It should not take long time. Okay, so now the
installation is done and Tableau going to be
launched automatically. All right, so with that, we have done the first
step where we have successfully downloaded
and installed Tableau Public at UPC. And next we're going to create
Tableau Public Accounts, where you can share
and publish your work.
73. Tableau | Create Tableau Public Account: Okay, so let's go
back to the website public.tableau.com and on
the right side at the top, we're going to click on Sign In. And then we have to click
on this join now for free. And now we have to fill out this registration form in order to create a new Tableau
public account. So we have to enter the name, the E mail, the password,
and the country. And then we have to read
and agree on the terms. And let's click here. I am not a robot.
And at the end, you're going to click
on Create My Account. And now we got the message
to verify our account. So that's means we have to check our e mails in order to
activate our account. So let's do that. Okay,
So now after checking, I got an E mail from Tableau. So I'm going to click on it. And then I'm going to click on Verify now in order to
activate our account. So I'm going to
click on that and then it's going to
send me to my account. And with that we have brand new active
Tableau public account. Well, it's like any other
social media account. You can add your personal
information, for example. We can add our photo or avatar. So let me check what
I can do over here. I have this photo from
Studgard Television Tower. It's a meeting there. And
then I'm going to click Save. We can add many other stuff. Let's click on Edit Profile. As you can see over here, you can link your
social media accounts or add your websites and so on. So let's click Save now. All right, so with
that, you have now Tablo public Accounts, but it's still empty, we don't have
anything inside it. Next we will get the
training datasets, and I'm going to explain for you the data model behind them.
74. Tableau | Get Training Datasets: If you want to learn
any new tool like Tableau bar BI or any other
programming languages, you need always a good dataset for training and practicing. I start searching for good training datasets and
after a lot of research, I downloaded like
many, many datasets. But I was not happy with them. I didn't like them
because they don't cover all the scenarios that
we need for training. Let me tell you why
this is an issue. In real projects, your data going to be stored typically in data warehouses or data leaks inside many, many
different tables. The first step in any
visualization tools like Tableau or Power BI is to connect those tables and combine them in one
big data model. Training with only one
table not going to help you and prepare
you for real projects. That's why I decided to make
my own datasets to cover all the training scenarios
and to have multiple tables in order to learn how to
combine them in one data model. And of course, you
can use my dataset in order to learn anything
else like SQL, Python, Power BI, and so on. So let's see what I've
prepared for you. All right. The first
thing that we're going to go to the link
in the description. And then you're going to land
in my website where I've collected all the
course downloads and materials in one page. So for example, you're
going to go and download the training datasets. We have here some
important links. The three sheet sheets and many sketch notes that I have
prepared for this course. And then as well, you're
going to find for each section what are the
important links and sketches, and as well the Tableau files. This link going to be available for you after the
course as well. So you can always
come back here and download the stuff that you
need and of course for free. But now what we're
going to do, we're going to go and download the training datasets that
we need for our course. Here as you can see,
we have two zip files, one for the non EU
and one for the EU. So if you are
currently in Europe, what you're going to
do, you're going to go and download these datasets. But for all other countries, you're going to go and
download the first datasets, the non EU training datasets. And now you might ask, what is the differences between them? Well, it's about the
decimal numbers, since in our datasets we have
different decimal numbers, like the sales in
different countries, we have different representations
of the decimal numbers. So all the European countries,
they use, for example, the comma to separate the
decimal from the whole number. But in many other
countries, USA, in Asia, we have the.in order to separate decimal number
from the whole number, and if you are using
the wrong format, what's going to happen? Tableau will not understand that this field is a decimal number and it's going to
convert it to string. Now, depend on your location, go and download the
datasets for me, I'm in Germany, so I'm going
to go with the second one. And as I said, it's
depend on your location. Let's go and click on that. Next I'm going to do,
I'm going to go and grab the zip file and put
it somewhere safe. So I don't want to leave it
underneath the downloads, so I'm just going to
create a safe path for that and then start
extracting the data. Okay, now let's go
and unzip the file. So I'm going to go and
extract all of them. Okay, so now let's go inside
it and check the data. So here we have three
different datasets. The first datasets, the Tableau projects,
sales dashboards. We're going to use it
in the last section once we start building
our projects. Then we have two other datasets, the big datasets and
the small datasets. We're going to use these two datasets in the whole course. So the small data source
and the big data source, they are very similar. So now you might ask me, why
do we have two datasets? Okay, so now let's open both of them and see what do
we have inside them. So as you can see, we have
almost the same tables, so customers, we have
orders, products and so on. And so they are
almost identical. And now you might ask me,
why do we have two datasets? We, we have many different types of calculations and functions. For example, some
calculations going to change the data
at the role Evel. And it's better to
have a small dataset in order to understand
their results easily. On the other hand,
we have calculations like aggregations
on the table LOD. It's better to have many data in order to understand
how it works. That's why I have
decided to have two datasets in order to
cover all those scenarios. Another thing about
the datasets is that the file type is CSV. We have only one
Jason over here, so you can use either
table public or tabletop in order to
follow me in the course. All right, so now I'm
going to walk you through the data model of our datasets. Here we have three
typical tables. Our datasets contain information about the superstore use case. It is simply sales
transactions of customers ordering
products by a company. It's classic and very
easy to understand. The first table
in our data model is the customer's table. It contains all
customer information such as the name
of the customers, their locations,
and their score. In the small datasets, we have five customers, and in the big one we have
around 800 customers. And the second table in our
data model is the orders. It contains all the orders
placed by the customers. So we have informations
like the order, date, sales, quantity,
and profits. In the small datasets,
we have ten orders. And in the peak dataset we have around five years of data. And that's really helpful once we start building clusters. And the third table in our
data model is the products. It contains all
the products that we find inside our supper store. So we have informations
like the product name, category, and the subcategory. In the small dataset, we have only five products in the category monitor
and accessories. But in the peak datasets, we have more than 2000 products with categories
and subcategories. All right, so now we
have those three tables, but as well we have
relationships between them. Like for example, there is a relationship between
the orders and customers. They can be connected
using the customer ID. And if you check the
orders and products, you can find another
relationship between them where you can find the
product IDs in both tables. And with that we can make a relationship between
the orders and products. All right. Kay, so I left all those informations
in my website. You can find there
all the links to the datasets that I found
during my research. So you can go there and
check them if you want. All right, so now with
that, we have everything. We have the tools, we have the data, we have the accounts. Next we will go and build our first visualization
in Tableau, and we can publish it in our
new Tableau public account.
75. Tableau | Publish First Viz: Okay, if you want, so
let's start Tableau, public Disktop, if you
don't have it open already. And then in the starting page, we're going to go
to the left menu to connect Tableau to our data. So click on Text File, and now we're going to
go and find our file, the Customer CSV that
we just downloaded. And now we can see the
customer's data inside Tableau. Let's move to the worksheets. I'm going to click on the
orange tab over here, sheet one, to create
a new worksheet. And now we're going to build our visualization in Tableau. We have only to drag and
drop from the left side. Let's drag and drop the
country in the columns. Let's get another one. Let's move the
account to the rows. All right, so that was it. We have our first viz. And here you can see in this
visual how many customers we have in each country. With that, we are done building the workbook and now
it's time to share it. Sadly in Tableau Public, we cannot download it
locally at our PC, but I'm going to show
you work around later. Now the only option
that we have is to publish it to our new
Tableau public account. Okay, now in order to do that, let's go to the main
menu over here. Then click on Files. And then we're going to click
on Safe to Tableau Public. For the first time, you
have to sign in with Tableau public account
that we just created. All right, now let's
click on Sign In. And now we have to
give it a name, and I call it my first viz. And once you click Save, Tableau Public Desktop can start publishing our workbook
to Tableau Public. Once it's done with
the publishing, a web page can open
automatically, directly showing your viz
in your public account. Here's our Z. Let's go
back now to our home page. And as you can see over here, we have our first viz
published to Tableau public. Let's go inside it again. Now everyone in the
world can see your viz, interact with it and
even download it. Let's see how we
can download that. There is download icon over
here, then click on that. And now you can select the
file format that you want. Let's select the last
one is Tableau workbook, so click on that and
then click Download. And now we will get
the Tableau file bx, where we have our data and
our visualizations inside it. So if you open it, you
can see our work again. And this is the work around that we can use in order to save our work locally at our
BC in Tableau Public. All right, so with that,
you have published your first vis to your new
Tableau public account. And next I'm going
to take you in a quick tour in the
Tableau interface of the three main pages
of Tableau and we're going to learn how to
navigate through Tableau.
76. Tableau | Tour of the Interface: Now I remember in 2014, the first time I opened Tableau, I was overwhelmed with all icons and parts that we
have in Tableau interface, and navigating
through Tableau pages was very confusing
for me at the start. And that's why I'm
going to take you in short tour in Tableau interface. So let's go. Okay, so now
let's go and start Tableau. Now the first thing
that I want to show you is that the whole thing, the whole file, we
call it a workbook. And the workbook is
like any other book. It contains different sheets. And the Tableau workbook
contain three main pages. We have the start page. It is the main page where you can connect our data to Tableau. And then we have the
data source page. It is the place where you can connect and combine your tables together and do changes to the meta data like renaming
columns and so on. And the third page where
you're going to spend most of the time is the
workspace page. It is the place where
you're going to build your data zolizations. All right, so now we can
learn how to navigate through those pages and how
to switch between them. Okay, once you start low, you will be in the welcome
page, the start page. Now if we want to go
to data source page, we have to connect something. Let's go again to the
left side over here, Connect to text file and then select our file
customers and open. Once we do that,
we're going to land automatically in the
data source page. Now if we want to go
back to the start page, in order to do that,
we're going to go to this Tableau icon over
here on the left side. If we click on that, we're going to go back
to the Start page. If we want to go back to
the data source page, we're going to click
on the same icon. Click on that again, and we are back to the data source
page with this icon. We can always go back to
the start page of Tableau. All right, now let's see how we can go to the workspace page. In order to do that, we're
going to go to the bottom. Over here you will
find different taps. The first one is always
the data source tab. This is exactly where we
are now at the data source. But now if we select
the sheets Tableau, going to take us to
the workspace page. If you want to go back
to the data source page, there is two ways to do that. First, we can stay at
the bottom over here, and we can select
the data source tab. By clicking on that, we go
back to the data source. And the second option is
that at the data pane, if you go to the left
side, over here you can see our data source customers. And if you double click on it, we're going to go back
to the data source page. Okay guys, that's what it's, this is how you can navigate
through Tableau pages. Let's have now a quick
overview of each page. Okay, let's start with the
first page, the start page. We can see here three panes
connects, open and discover. In connect we can find all different types
of datacnectors. And in Tableau public
we have around ten. That's enough for the training. But in Tableau to we have
over 90 data connectors. Now in the middle, we have open, once you start Tableau
for the first time, this section going to be empty. But as you start
creating new workbooks, Tableau going to
start showing you the most recently
opened workbook. And this is really nice to have quick access to our workbooks. Here, we have only won the first phase that
we published before. And on the right side
you will find Discover. You will find
different stuff from Tableau team like blogs, news, training
tutorials, and so on. And now in the
bottom, you can see information about Tableau
software, for example, now it shows that we
can upgrade to Tableau dicto or later once Tableau releases a new
version of Tableau, you will find information
here to update your Tableau. But since we just installed the most recent version of
Tableau, it doesn't show it. Okay, so that was it
for the start page. Let's jump now to the next one. We have the data source page. By now, you should
know how to go there by clicking
on Tableau icon. Okay, what do we have here in the data source page
on the left side, you can find all
informations about our data. In connections, you can find
the connection informations, and in files you can find all tables that are
inside our data. And then in the middle we
have the data source name. And then over here we have the area where we're going
to build our data model. And it contains two layers, the logical layer and
the physical layer. I'm going to explain that
in the next tutorials. Don't worry about that. Beneath that, we
have the data grid. It's going to show us
a sample of our data, and as default,
it's going to show the first 1,000 rows of data. And in the left side
we have another grid. This is the meta data grid. It shows us more details
about the tables fields. All right, so
that's all for now. We're going to move
now to the next page, the workspace page. And we can do that by
selecting the sheet tab. Okay, in the workspace page, we can spend most of our time here building our
visualizations. That's why we have a lot
of icons and stuff around. So let me quickly guide you
here in this interface. Okay, so we're going
to start on the top. We have the tool
bar. It contains a lot of icons and
those icons are. Most frequently used
functions in Tableau. As you are building
your visualizations, you have a quick access
to those functions. As you might already notice, there's some functions
that are not selectable. Well, you have to understand
here that in Tableau, if something is grayed out, that doesn't mean
that this feature is not available
in Tableau public, but it means it is not
relevant for the visual. Now for example,
if I go over here, it's going to sort the visual, and since I don't have anything, it's not relevant to sort it. Let's check the other icons. We have the Tableau icon, it's going to take us
to the start page. You know that already
we have the undo and redo the last
action in the visual. And as you can see as I'm hovering the icon
Tableau going to give me short description
of the function here we can create
a new data source, or over here we can create
a new worksheet and so on. So just hover all the icons and you will see the function. All right, now let's
move to the left side. We have here two panes. The data pane and analytic span. As default, Tableau
Gonhowas, the data pane. But if you want to go
to the analytic span, just simply click on it. You can switch between them
by just selecting them. Let's see what we have
here in the data pane. The first thing is the data
source contains our data, and below that we can find the tables inside
this data source. We have currently only
one table, the customers. And we can see over here the fields or columns
inside our tables. And here we have as
well a search field. Sometimes our data
source gets really big and we're going to
have a lot of fields, so this is really nice way to
search for specific field. Okay, so now let's go
to the analytics pane. And you can find over
here predefined functions that you can add to your visual, like adding an
average line or doing clustering or even you can create your own reference line. Really nice stuff. Okay, so now I'm going to switch
back to the data pane. All right, so now let's
move to the middle. And you can find over here
different shelves and cards. We're going to use them in order to build our visualizations. And everything works
here with drag and drop. So let's start with
the first one, the rows and column shelves. The visuals of tableau, they have two dimensions, the rows and columns. Like any other tables, if you put fields in
the column shelf, it's going to create a
column of the table. While if you put fields
in the row shelves, it's going to create
a row of the table. Easy stuff. So now
let's have an example. Okay, so let's go to the
left side and we're going to drag and drop the
countries on the columns. And with that we define the columns of the
visual over here. So now we're going to have
something on the rows. Let's take the counts and
drag and drop it on the rows. And with that we define the
visuals, columns and rows. If you want to
swap between them, you can go to the Tool bars over here and click
on this icon. And you can switch
between them very easily. If you have a lot of columns,
I'm going to switch back. And now we can add more
columns or more rows. For example, let's take the City Drag and drop it
on the columns over here. You can have multiple stuff. Now if you want to remove
one of those columns, you can do that by drag and
drop on the empty space. Okay, let's move to
the bages shelf. You can use it to split the current visual into
a series of pages. If you want to analyze
something like step by step and take it slowly,
let's have an example. Okay, let's take
again, the customer. Count a drag and drop
it on the pages. You can see on the
right side we have a new window to
control the pages. And now we are at the
first page where we have countries with
only one customer. If we click over here
on the right side, you will get the countries
with two customers and so on. And now for the next example,
I'm going to remove it. So I'm just going to drag
and drop in the empty space. All right, so let's
move to the next shelf. We have the filters. You can use it in order
to filter our visual. For example, let's
stick the countries, drag and drop it in the filters. And now you can here
decide which country is going to stay and which country going
to leave the visual. Now if I select, for example, let's remove France
and click Apply. You can see our visual don't contain now
the Country Friends. Now I'm going to
remove it again from the shelf by drag and
drop in the empty space. Then we have the Mark card. You can use it in order
to design the visual. For example, we can
add new colors. If we drag and drop the
countries on top of the colors, we will get a color
for each country. Or we can change the
size of the pars, either make it small or big, or we can add labels and so on. Okay, now let's
move to the middle. Of course, here
we have our view, it contains visualizations
or we call it visas. First we have the title and you can change it
by double click on it. Let's give it a
name. For example, customers by country,
and then click Okay. Okay. Below that, we
have our visualization, and it contains different stuff. For example, we
have the headers, and here we have the countries
as well, we have the axis. Now the intersection between
those fields are the marks. Those marks could
be like pars in this example or could be a line or circles
or any other shape. Now if we check the bottom
of table interface, you can find status par. It contains a lot of
details about our visual. For example, it says
we have three marks. Of course we have three parts. We have one row
and three columns. The total number of
customers is five. Now let's add more stuff to the visual to see how
those status change. Let's take the scores, drag
and drop it in the rows. You can see here we
have now six marks, we have six pars, we have
two rows and three columns. Those stats are really important once your visualizations
get complicated. Now we have very simple one, we can count it and
see we have six parts. But if we have a lot of
dots and a lot of points, it's really hard to count them. It's really nice to check the status par to see
details about our visual. All right, now let's move
to the right side and we're going to go to the show
me icon. Select that. Now you will get different
visualizations that Tableau offers by just
clicking on them. You're going to switch
the whole visualizations in our view here. We can switch it to tables or to pie charts or to
three maps and so on. Now just go and explore those
different visualizations. You might already noticed that some of them are grade out, we cannot use it here. Again, it's available but we don't have the
requirements to use it. For example, if you go
to the line chart here, Table tells you what
are the requirements or what Tableau needs in order
to build this visualization. It needs one date. It
doesn't need any dimensions, and it needs at
least one measure. Currently in our view,
Tableau cannot create it because we don't have any
date field in our view. All right everyone. That was the main component
of the worksheets. Now, before we go
to the dashboard, I'm going to do few
stuff. You can follow me. Okay? I'm going to undo those visualizations
and go back to the par. And then I'm going to
create a new sheets. So I'm going to click over here, create a new worksheets. And then I'm going to
take the countries. And this time I'm going to
take the scores over here. And then I'm going to use
the Pi charts over here. I'm going to put
some labels on it. Okay, that's enough. Let's
go now to the dashboards. We can do that by creating a new dashboard on
the icon over here. Now we are at the interface
of the dashboard. I'm not going to explain
everything over here. It's just important
to understand that in the dashboard we can start compiling different
sheets in one place. We can drag and drop
the sheet number one where we have the
customers by country. Then we can take the
sheet number two, just place it
somewhere over here. Then I have in one
place two visuals, the sheet number one
and sheet number two. This is the main job of the dashboard. All
right everyone. Now I'm going to
show you the last type of sheets we have, the story in order
to create a new one, we're going to go
to the bottom over here and click in this icon. And with that we have created a new story, stories in Tableau. They are like sequence of visuals and we use
it usually for presentations if
you want to tell a story from our data. All
right, what do we have? Over here in the left side, we have the visuals
that we created. We can see the worksheets
and as well the dashboard. And then over here we can
add new story points. In the middle we have
in this section, like Navigator, to go
through our story. And then here we're
going to present the story or the views. What we're going to
do now in the first one we can drag and
drop the dashboard. Let's two that now. We can add a next step by
adding plank over here. And then we're going to
take the sheet number one and then we can add a new one blank and
then sheet number two. So now we have story. It starts with the big
picture with the dashboard. And as we go through
the story step by step, we go more in details.
In each visual. It's really nice
way to present or to tell a story
using our visuals. All right, so now we have the
Tableau software installed. We have the two
training datasets, the public account
to share your work, and everything is ready to
start learning Tableau. So with that, we have
finished this section where we have prepared your
environment to practice Tableau. And in the next section, we will do deep dive in the
Tableau data source to learn how to build a data model in
Tableau by combining tables.
77. Tableau | Section: Data Modeling: Data modeling in Tableau. Each successful
dashboard or charts in Tableau can be based
on a solid data model, and having data modeling skills is essential for each table, objects or business
intelligence projects. So that's why we're
going to start learning the fundamentals
of data modeling, including the star schema
and the snowflake schema. And then I'm going
to introduce you to the Tableau Data Modeling, where you can learn the physical
and the logical layers. And then we can
learn the different methods on how to combine tables in data modeling using
joins union relationships, data blending. And of course, in order to understand the
differences between them, we're going to compare
them side by side. And of course, I'm
going to guide you in when to use which methods. And at the end, you're
going to go and build two data sources based on
our training datasets. So let's start with the first topic where
we can understand the fundamentals of data
moduling. Now let's go.
78. Tableau | Concept of Data Modeling: In real projects, your
data going to be stored, typically in data warehouses or data links inside many,
many different tables. The first step in any
visualization tools like Tableau or PI is to connect those tables and combine them in one
big data model. Let's start with the question, what is data moduling? Data modulings the
process of organizing and representing data in a clear
and understandable way. Each data model has
entities, entities, things like customers and
products or events like orders. And inside those entities,
we have informations, and we call them attributes like the first name and the last name inside the entity, customers. And we describe in the data
model how those entities are connected or related to each other and we call
it relationships. This data model, this
visual representation of the data makes it easier for us and for
programs to understand the data, which is really
important for making decisions and improving
performance of the business. All right, so we have
three different types of data models at different
levels of abstraction. First we have the
conceptual data model. This type is high level
representation of the data model without going in details on how the data model
is implemented. It's like a map that shows the important entities
and the relationships. And we usually use this type to explain the data models to business analysts
and stockholders to understand the big
picture of the data. The second type is
the data model. In this data model,
we go more in details on how the data is
structured and organized. We define in this model the
attributes of each entity, and it includes as
well constraints and more details about the relationships between
the entities. This data model is usually
used by database designers and developers as a blueprint
for the implementations. And the third type is
the physical data model. This type represents the actual implementations
of the data model. It includes all the
technical details about how to store the data. Like the data types
of the atroputes, the primary and foreign keys, indexes, and so on. This data model is used by developers to create and
manage the databases. All right, so let's summarize. The conceptual data model shows the big
picture of the data. The logical data model provide a blueprint for
the implementations. And the physical
data model shows how the data is implemented
in the databases. And Tableau did adapt both the logical and physical data models in the data sources. But we don't have conceptual
data model in Tableau. Don't worry about it. I will
show you more details later. All right, so now
for analytics and specially for datawarehousing
and business intelligence, we need special data
models that are optimized for queries
and for analytics. It should be flexible
and easy to understand. And for that we have two
special data models. First one is the star schema. Star schema has a
central fact table and surrounded by
dimensional tables. The fact tables
contains events and the dimensions holds
descriptive information. The relationship
between the fact and the dimension tables
form star shape, and that's why we call it
a star schema data model. We call it snowflake schema. It is very similar
to star schema, but the dimensions here are breaking down into
sub dimensions. Normalized tables or dimensions means that those
tables are broken down into small pieces to avoid having big tables
or big dimensions, which leads to many
data duplications and slow performance. The shape of these
data models looks like Snowflake star
schema is a simple and easy to understand
data model and we usually use it if our
dataset is small or medium. On the other hand, the snowflake
schema is more complex, but it eliminates the duplicates and reduces the storage spaces. We usually use it if we
have a large datasets. All right, so the
datasets that I've prepared for this
Tableau course are using the star schema data model just to keep it simple
and easy to follow. All right, our data model has a name and we call
it Star schema. If you're going to
work on real projects, you're going to hear about
the star schema a lot. Star schema has mainly two types of tables, facts and dimensions. For example, we have
the table Customers. It describes each customers
by their first name, last name, country, and so on. So customers is a
dimension table. And we have another dimension
table in our data model. It is the products product table describes as well each product by their name and category. It is as well a dimension. All right, so now
let's talk about the second type of tables
in the star schema. We have the facts, for example, let's have a look at the
big table in the middle, we can see three things. You can see first, a lot of
keys to the other dimensions. We have the order
ID, customer ID, product ID, and
we can see dates. So we have the order date, the shipping date,
and the third thing, we can see a lot of numbers. We have sales quantities, profits, we call them
as well, measures. If you see those three things, that means we have
an event or fact. Table Facts connect
dimensions together. It has dates and
as well measures. Okay, So to summarize, how do we decide if a table
is dimension or fact? If you have a table that
contains information about a physical
person or an object, like employee,
customers, broducts, then this table is a dimension. And usually they
are small tables. And on the other
hand, if you have a table that contains
events, for example, we have sales or doors
logs, ETM transactions. Any table that has events, transactions and has time in it, It facts, and usually they
are really huge tables, okay? So in our data model, in the datasets we
have two dimensions. We have the customers
and products, and in the middle we have
our fact, the orders. All right, So now if you
hear in your project someone talking about
star schemas and so on, you know exactly what they mean. It's very important
concepts in analytics and BI words if you are using
Tableau or Bar BI. All right. So with that, you have learned some important concepts
in data moduling. Next we will learn the Tableau data model
and the two layers, physical and logical layers.
79. Tableau | Tableau Data Modeling: Okay, once we connect
our data to Tableau, we have to create a data
model in our data source. If your data contains
only one table, then your data model
is very simple. You have single table
in your data model. But in real life projects, things get more complicated where you have multiple tables. And Tableau here offers four different methods of how to combine and
connect your tables. We have relationships, joins, union, and data blending. Now, before we start doing
deep dive and those methods, let's first understand that
data moduling in Tableau, In Tableau data model,
we have two layers. We have the physical
layer and on top of it we have
the logical layer. In the physical layer, we
might have some couple of physical tables and we can combine them in Tableau
using two methods, either joining the tables or
using union between them. Now let's move to
the logical layer. It is the top level layer
and provide us like an abstract to hide all the details in
the physical layer. This is especially
nice if we have a lot of tables in
the physical layer. Once we are building
our visualizations, we don't want to see all those tables in the
physical layer. The logical layer is
going to provide us like an abstract or going to
hide all those details. The result of merging the
tables using join and union in the physical layer
are going to be presented in the logical
layer with single table, flat table, and we call
it illogical table. That means we're going to
have two logical tables. The first one going to represent three tables after
doing the join. And the second one going to represent two tables
using the union. But we still have in
data modeling to connect those two logical
tables in Tablo, We have only one
method to do that, and we call it relationships. It's very important
to understand that in the logical layer, we cannot merge tables
in one table after reconnecting them
using the relationship between the two logical tables. The table is going to stay as it is and nothing
going to be merged. We just describe
the relationship between the two logical tables. Now back to those two layers, both of the physical layer
and the logical layer. We can find it inside
Tableau Data Source. And as you know, on top
of the data source, we have our visualizations. And you can see in this example only the tables from
the logical layer. And you can start building
your visualizations using the data available
from the logical layer. But sometimes as you are
working with the projects, you build another data source
with another data model. Here in this example, it's
important to understand that not all logical tables comes
from the physical tables. They could come directly
from your source system. Now in order to build
one visualizations from both of the data models
and the data sources, we have somehow to connect those two data models
or data sources. And we can do that in
the visualization level where Tableau offer us the last and very unique method of connecting and
combining tables, something called data blending. By looking at this,
you can see that Tableau offer us four
different methods of how to combine and connect tables in different layers
and different levels. In the physical layer, we
have the joints and unions. We have in the logical
layer the relationships, and at the visualization
level we have data blending. All right, so now let's
see in Tableau how we can navigate through the physical
and a logical layer. We are currently at
a data source page, and as a default,
we're going to be a logical layer in
the data model. So that means anything
that we drag and drop in our data model is going to be considered as a logical table. The customers is illogical
table. Let's take another one. Let's take the orders, drag and drop it over here. So this is our second
logical table. And as you can see, Tableau did create between them
a relationship. Because at a logical layer we
can do only relationships. So now we are at
the logical layer, how we can go to
the physical layer? In order to do that,
we're going to go inside a logical table. Let's go to the customers
and double click on it. Once we do that, we're going
to go to the second layer. We are inside the
physical layer now. Tableau going to
tell you over here, the customers is made of one table because we have
only one physical table now, anything that we
drag and drop in the data model is going to be considered as a physical table. For example, we can take
the Customer Details, let's drag and
drop it over here. And by default, Tablo going
to create between them, not relationship,
it's going to create a joint between those
two physical tables. And of course we can do
a union between them. In the physical layer, we
can do joins and unions. As you can read over here, it says the customer, the logical table. Customers is made of two physical tables. If you have her on this icon, you will see exactly
that we have two physical tables defines
the logical table customers. Now if you want to go up
back to the logical layer, we can do that by just closing the physical layer.
Let's click on that. Now you can see that the
customers has a new on, it says in the physical
layer there is like a join and we get
more information if we have her on the tables, it says logical table Customers. That is made of two
physical tables, the customers and the
customers details. That means the data in the logical tables comes
from the physical layer. But if we go to the
orders over here, you will see no physical tables. The data comes directly
from the original tables. And with that, we have
learned how to navigate through the physical
and logical layer. All right, so with that, we have learned the data modeling in Tableau and what is the
physical and logical layers. Next, we will start
learning how to combine tables in Tableau and we
will start with joins.
80. Tableau | Joins: All right, so let's start
talking about joining tables. We usually have two tables, table and table B. If we want to combine
them in one big table, then we can use
joint between them. The first thing to
understand is that once we use join
between two tables, then we have two sides. Table A going to
be the left table and table B going to
be the right table. Now what's going to happen
after we join the tables? All the fields from the left
table will be at the output. And then all the fields from the right table will
be added next to it. Joints combines the fields or
the columns of two tables. Now, in order to
do joins things, first we need the key field. It is a field that you can
find it in both tables. And after that, we have to
define the type of join. And we have to
choose between four different types of joints. We have the inner join, the left join, right
join, and full join. If you know L, then
you know those types. It's exactly the same logic. But let's have a
quick example to understand the four
types of joints. All right, now we
have this example where we have two simple tables. We have the customer's names
and the customer's age. And we want to combine
them in one table because it makes no sense to have two
tables about the customers. We want to make
one customer table and we want to combine them. In the first table we have
the ID and the names. And the second table we have as well the ID's and the age. It's really easy. The key for this joint is the customer ID. Now let's see the
different output using those different
types of joints. Let's start with the first
type of join, The inner join. Inner join says the
output going to show only the matching rows from
the left and from the right. That means any matching rows will not be presented
at the output. Let's see how this works. The first thing that's going
to happen is that we're going to combine first
the field first. We're going to start
with the left side, then the right side. Now we're going to start
matching the rows. We're going to start
from the left side. Do we have the user ID one
in the right side as well? We have a match in both tables. We have the customer
ID one, this, we're going to see it at the output and then we
proceed on the left side. Do we have customer
ID number two as well on the right side? You see we don't
have it. We have only the customer number three. That means two is not matching
on the right side and the customer three
is not matching on the left side. That was it. If you use inner join
in this example, you will get on the
customer ID number one, since we find it in both tables. Let's
go to the next one. We have the left
join, left joint says we're going to
have everything from the left table without
checking anything but from the right table
we're going to have only the matching rows. If we do lift joint
between those two tables, we're going to have
the following output. First we're going to
have the fields from the left table and the fields from the right
table near each other. And then we're going to
have all the customers from the left table
without checking anything. Everything going to be presented over here, those two customers. And then from the right side, we're going to have
only the matching rows. That means, do we have the customer ID number
one on the right table? Yes, we have it. Then we're going to have it at the output. But the customer ID number two, we don't have it at
the right table, which means it's
going to be empty. Empty means nulls. Here we're going to have
the values of nulls in both of the field ID and
as well in the age. And that's it, this is
the output of left join. All right, so now we're going
to move to the next one. We have the right joint. You might already
understand how it works. We're going to have
all the roads from the right table and only the matching rows
from the left table. Let's see how the output
is going to be if we do right on between
those two tables. As usual, we're going
to have all the fields, all the fields from the right, and we're going to
have all the rows from the right table without
checking anything. We're going to have
those two customers, and then we start matching
from the left side. Do we have the customer
number one? Yes, we have it. We're going to add it over here. Do we have the
customer number three? As you can see, we
have only the two. That's means we don't
have informations and we're going to
have the nulls. Those can be empty, That's it. It is exactly the opposite
of the left join. Now to the final type of join, we have the full join. Full joint means
everything from left and everything from right
without missing anything. Let's see what's going
to happen if we have full joint between
those two tables. As usual, we start with the fields from the left
and from the right, then we take everything
from the left side. We take those two
customers over here. From the right side,
we're going to have the matching grows for
those two customers. For the ID number one,
we have this one, but for the two, we don't
have any matching grows, we're going to have
nulls over here. But as you see we don't have everything from
the right side. The customer ID number
three is missing. That's why using full
joint we're going to have those informations over here and then we're going to match it
as well from the left side. Do we have any customer number
three on the left side? We do have that means we're
going to have nulls as well. Now by checking the output, you can see we have everything, all the data from left, all the data from right where there is no match,
we're going to have nulls. As you can see, you need to be really careful with the type of joint you are using
because using the wrong one, this could cause of losing data. If you want to be safe and you don't want to lose any data, then you have to
use the full join. But sadly, full joints are very slow and you're going to end
up having very big tables, especially if both tables have
a lot of unmatching rows. And now I want you to understand how joints works in Tableau, what can happen in the
background once we join tables. We have the data source, we have the visualizations, and inside the data
source we have the physical layer and
the logical layer. In the physical
layer, we're going to join both of the tables A and B. Once we do that,
Tableau can create one new combined table A and B. In the logical
layer, this table, we call it a logical table which contains data from both tables. Then in the visualization layer, let's say we want to select the fields of F two and F four. Tableau can query
the data source and the data source going
to get the data from the new combined logical table B and then send the data back
to the visualizations. You can see the interaction between the visualizations and the data source going to
be at the logical layer. The physical layer
is going to be completely out of the picture. That's simply how joints
works in Tableau. All right, now how we can
do joints in Tableau. Let's say that we want to join the table customers
with the orders. First we're going to go to
the left side over here. Drank and drop the customers. The joint is going to be done at the physical layer,
we have to go there. Let's go inside the customers. And now we are at
the physical layer. We're going to take
the orders and just drag and drop it over
here at the empty space. With that stable as
default can create an inner joint between the
customers and the orders. If we want to
customize the join, we're going to go over here
at the icon and click on it. And we have here
two things to do. First, we're going to
define the type of join. As we learned, we
have the inner left, right, and full outer join. You can just click between
them and see which data can be missing and which data can be presented as the example
that I showed you. So I'm going to stay
with the inner joint and the next thing that
we're going to define, the key for the joint Tableau did understood there's
customer ID from the left, there's customer
ID on the right, and this is the perfect
match, which is correct. But let's say it was
wrong and you want to choose the correct
key for the joint. What you're going to do,
you're going to go to the left side over here,
Click on the arrow, you will get all the fields from the left table and
select the correct one. This example, the
customer ID is correct. So I'm going to stay with it and you'll go to the right side. You have as well, the
same icon over here. And you will get
all the fields from the right table and you select
the one that suits you. One more thing. Your key for the joint could be
not only one field, it could be multiple fields. You can add more
fields over here. You go to the next row and select the next
field for the join. But in this example, we have only one key. I'm
going to close this. We have set up the joints. You're going to stay
with the inner join. We can go back to the
logical data model. And as you can see, the table over here
has icon of join. It tells us that this
logical tables is a result of joining
two tables. That's it. This is how you can
do joins in Tableau. All right, that's
all for joints mix. We will learn the second misods, how to combine
tables using union.
81. Tableau | Union: All right, so now let's
talk about union. Let's say that we
have two tables and both of them has exactly
the same columns. Sometimes it makes sense to combine them in one big table, and we can do that
using the union. Once we do union,
what can happen? The columns and the rows of
the left table going to be presented at the output
from the right table. Only the rows going to be a pen at the output beneath
the first one. Union Going to
combine the rows of two tables in the
union correctly, we have two requirements. First, both of the tables should have exactly the same
number of fields, and second, the field should have exactly the
same data types. So as you can see, we
don't need the key between those two tables.
It's not like the join. All right, so now
let's have a quick and very simple example
about the Union. We have here very
simple two tables, the orders of 2022, the orders of 2023, and as you can see, both of the tables has exactly
the same structure. So we have two columns, the ID and date, in both tables. And it makes sense to
merge them in one table. We call it orders. So if we do union between them, what can happen at the output? It's going to start
from the left table and it's going to take
the fields first, the ID and dates. And then it's going to take all the rows from the left side and put it at their results
now from the right table, we will not take
again the fields because we have it already
from the left table. It's going to take only the rows and abandon at the
end of the table. It's going to take
the two orders, 3.4 and just put it beneath
the table over here. And that's it. It's
very simple and easy. It just needs exactly
the same number of columns or fields and
exactly the same data types. Now let's understand
how union works in Tableau and what's going to
happen in the background. Once we do union, we have here again our layers. And union is very similar to
join in the physical layer, we have our tables A and B. Once we do union between them, Tableau going to create a new combined
logical table where it's going to combines
the rows of both tables. Then in the visualization level, let's say that we
take the field F one. Tableau going to send a
query to the data source. And data source going to ask the logical table
to get the data. Once Tableau get the data
from the data source, it's going to be presented
at the visualization. As you see again here, the interaction is between the visualizations and
the logical layer. All right, now let's see how
we can do union in Tableau. We're going to work
with the two tables. Orders and orders are shaves, Both of them has exactly
the same number of fails and as well exactly
the same data types. In order to do that,
we're going to take the orders drag and drop
it on the logical layer. But you know, we can do union
only in the physical layer. We have to go inside the orders. Double click on it, and now
we are at the physical layer. Let's take the second table, the orders a show, instead of dropping it
at the white space, because Tableau then
going to create a joint. We don't want to do
that. We want to create a union just and drop
it beneath the table. And as you can see,
Tableau going to say drag table to do union, just place it beneath it. Tableau going to do union
between those two tables. And as you can see,
there is two lines. Gray lines indicates
that there is union. If you want to
check that, you can check at the result over here, the data, we will get a new
field called table name. And you see some
records comes from the orders and other records comes from the
orders are Sheaves, which indicates that we have one combined table of
both of the orders. And the orders are shave. Let's go back to
the logical layer. So I'm going to
press here, the X. As you can see, we have
a new icon over here, it indicates that
we have a union. As you can see, the tooltip of Tableau, it explains everything. We have a logical
table called orders. It is the result of union, table orders and
orders achieved. This is one way of doing union between two tables in Tableau. There is another way to do that. So let me show you
how to do it first. I'm just going to remove it, drag and drop it
somewhere over here. As you can see on the left
side we have something called new Union double click on it and you can see we
have here two options, the manual and as
well the automatic. Then we're going to get the result exactly
like we just did. What we can do, we can just drag and drop the
tables over here. The orders and the
orders are here. And then click okay. With that, we get
exactly the same results without going to
the physical layer. And drag and drop two tables and put it exactly
underneath the table. This is nice way to do
union between two tables. You can check that by just
going to the physical layer. Double click on it.
As you can see, we got exactly the
same results here. We can check the table name. We have orders and
orders achieved. All right, so now let's
check the second option where we can do
union automatically. I will go back to
the logical layer and just remove the
union over here. Let's start a new one
from the scratch. And now we're going to
go to the automatic. What do we have over here?
Imagine that we have around 100 tables
about the orders. And this is very
common if you are not working with databases, you are working with files, and the files has limitations. So what we're going to do,
we're going to go and split the files after day after
month after year and so on, so we end up having
a lot of files. And it is very painful if
we're going to go and drag and drop all those files
in Tableau to do union. And instead of that,
we're going to define for Tableau or rule Tableau, going to go and search for
all files that's follow the rule and do union between
them. What that means. For example, we have
here two tables, the orders and the
orders achieve. What is the naming
convention over here? Both of them starts
with the orders. I could have like a third table called Orders underscore 2022. Orders underscore 2023. And so there is a rule I'm following here in my
naming convention, and I can specify
that in Tableau. Let's see how we can
do that over here. The first option is going
to include or execlude. I'm going to leave
it as includes. Now, I'm going to
specify the rule. It starts exactly with
orders after this word. It doesn't matter after that, it could be underscore 2022, 2023 or nothing and so on. Anything after that doesn't matter what we're
going to specify. After that stars means
anything after orders. Then we have some
options to tell Tableau where exactly to search, either at the subfolders
or at the parent folders. I'm going to leave it as it is, and then click okay.
Now we have a union. Let's see what Tableau to say. It says we have a logical
table called union. And it says we have
many union table because we have the
automatic way of doing that. Now let's check whether
Tableau did that, correct? As you go to the right side
here and the overview, you find we have a new
field called path. It is the path of the files. Let's see that. I'm going
to go to the sheet one here and just drag and drop the
past to see just the files. So, as you can see,
Tableau did it correctly. We have the orders
achieve and the orders, it's a really nice
way if you have a lot of Ss and Excels to do it automatically instead of drag and drop all those tables. Usually in my projects,
I never use this because all the data is prepared in the datawarehouses
or in the data link. So with that, we have learned all the different options on how we can do union in Tableau. All right, so that's
all for union. And next we will learn
very important methods, the relationships in Tableau, or we call it noodles.
82. Tableau | Relationships: All right, so now let's
talk about relationships. In 2020, Tableau introduced a new methods on how to combine and connect
tables together, and they called
it relationships. They made it even as
a default methods on how to connect tables, since it is very
fast and flexible. What is relationships and
how it works in Tableau, it is completely different
than joins and union. If we have in the logical
layer, two logical tables, A and B, we can connect them at this layer using
the relationships. Think of the relationships as a contract between two tables. When Tableau uses the
data from those tables, it has first to
check the contract in order to understand how
to generate the queries. And now it's very important
to understand that once we connect the tables
using relationships, the tables can stay
separated from each other's and Tableau will not create
a new logical table, so everything going to stay
as it is without any changes. And here we just describe the relationships
between two tables. Now in the visualization level, if we take the field F one from Table A and four from Table B, what's going to happen first? Tableau going to
check the contract in order to understand how
to generate the queries. And then it's going to send
the query to the first table. And then it's going to
send another query to the table B in order to
get the data for four. And then the data going
to be combined at the visualization level
and not the logical level. All right, so now let's
see how we can create relationships in Tableau.
It's really easy. So we're going to stay
at the data source page and as we'll add
the logical layer, we will not go to
the physical layer and all what we
need is two tables. So let's take the orders, drag and drop it over
here in the data model. And then let's take
the customers. Now as you can
see, as I'm moving there is like a noodle
or relationships. Let's drag it here. Tablo
going to automatically create relationships between the orders
and the customers. Now how are we
going to configure and set up the relationship? So let's go to the Nodle over
here and just click on it. And then there will
be no new window or something for the set up. We're going to go to the
meta data over here. If you don't see the
information like this, then you can go over
here and you will see the relationships
and the logical tables. So make sure you are
selecting the relationship. There is like three things that we're going to set up
at the relationship. First, it's going to be the key. It's like the joint key. It is common filled
between the two tables. Now, as you can see over here from the left table we
have the Customer ID, and the right table we
have the Customer ID. And Tableau did
automatically understand that this field could
be used as a key, which is correct, but if you want to change it,
you can go over here. So we will get a list of all
fields on the left table. And as well, you're
going to go over here, you will get all the fields from the right table and you can add more fields for the key
currently it is correct, so I'm going to
leave it as it is. Next we're going to go to
the performance Options. We're going to extend the
performance options over here. And we have here two things. We have the cardinality
and the integrity. And if you leave
it here as it is as a default, nothing
going to go wrong. You will not lose any data. So you don't have to
change anything here unless you want to
optimize the performance. What do we have
over here? We have cardinality as many or
one on the left side. And on the right side you
can define the same stuff. For the integrity, we have
some records marks and, or records marks in order
to understand those stuff. Let's have an
example. All right, so now we can have example
for the cardinality. In relationships,
we have two tables, our orders and customers. There is a relationship
between them and the key for the relationships
is the customer ID. In the cardinalities,
there is two options, Either we're going
to use many or one. In order to decide which
one is the correct one, we have to do data profiling. Data profiling means we're
going to do deep dives in the data to understand the
values inside our tables. And once we do data refining, it's very easy to select
whether it's many or one. Now what those values
means many and one. There is a simple rule for that. We use many if there is
double kits in the key, and we use one if the key is unique and does not have
any double kit inside it. Now let's check the
example in order to determine whether
it is many or one. So let's go to the
orders over here. And the customer ID, you see in those values
there is double kits. We have the customer ID once
here and once here as well, and the customer
ID two is twice. So those values are not unique
and contains double kits, that's why we call it a many. Let's go to the
customers over here, you can see we have the
customer 123 and that's it. So those values are unique and there is no
duplicates inside that. We don't have the customer
ID one again in the table, so that means we can
specify here one. So now let's go through
all scenarios in order to understand what can happen in Tableau once you configure this. All right, so now let's
run the first scenario where Tableau going to
define it as a default many to many relationship
we have at the left side many and on the right side
we have as well many. And let's say in the
visualization level we talk the customer IDs from the order
and the sum of all sales. Then the name of the customer. All right, now let's see
how Tableau going to work. Tableau, first going to
check the relationships. It's going to say,
okay, it's too many, it's better to check
the whole tables on the left and on the right. So we're going to start
on the left side. We have the customer one. It's going to take it over here and it's going to
sum all the sales. Since it's many Tableau
can understand, I have to check the whole table. Tableau can scan the
whole table one by one. It's going to say, okay,
we have the sales 50. The next one is not the customer one and then go to the next,
it's going to skip it. And then we have again
the customer ID number one and it's going to do the sum 50-30 That means we're going
to have the value of 80. It is the sum of the two sales. And now we're going to
go to the right side to find the name
of the customers. It's going to check,
okay. It is many. So it's going to
scan the whole table for the customer ID one. So now the first
record, it's fine. Okay. We have the
customer ID one. It's going to take
Maria over here. But now Tableau will not stop. It's going to scan the whole table sense in the
relationships. It's many but it
doesn't make sense because the customer
ID here is Unique. Tableau going to check
whether there is customer ID one over here
and then go to the next, and then it didn't
find anything, so it's going to stay like this. And now Tableau going to
proceed with the next customer. We have the customer
ID number two, we're going to have
it at the output and then we're going to have
the sum of all sales. So Tableau going to scan the whole orders in
order to do the sum, we have over here the 20. And then we have here ten. So the sum of that is 30. Tableau going to have
at the output 30. So that's it for the left table. We're going to go to
the right table table. Going to scan the
record one by one. So the first one is
not the customer ID. Number two, we
have here a match, so John going to be at the output Tableau going
scan the whole table, so it's going to go for
the three and so on. And as you can see,
the output is correct using the default
methods of many to many. But we have here
problem with that. On the right table, Tableau
is doing a full scan, so with that we are losing
performance on the right side. So it's better to optimize its where we're going
to tell Tableau. If you find a customer
then that sits, you don't have to scan the
whole table because we have at the maximum one record
of each customers. There is no duplicates
and it is unique. And now we have to tell somehow this information for Tableau. In order to do that, we can
do it in the cardinality. On the left side it's
going to stay as many, but on the right side we're
going to say it is one. And with that Tableau
going to understand, okay, it is unique. We don't have to
scan the whole table and we're going to win
a lot of performance. All right, so now let's see
how Tableau going to work. Once we have it as many
to one on the left side, nothing's going to change
because we have many. So Tableau going
to scan the whole table for the customer one, the result going to be the same. Now on the right side,
things going to be changed. Tableau going to say, okay, customer ID number
one, there is a match. It's going to take
Maria as the output. But now Tableau,
Tableau will not search for the customer ID one
and scan the whole table. With that, Tableau
will not be doing any unnecessary stuff and we're going to win
some performance. We're going to go now
to the customer number two over here. Same information. So Tableau scan or do we have the customer
number two over here? No, we jump to the next
one. Yes we have a match. We're going to take
John, but Tableau stop as well and we'll
not scan the next record. As you can see, we have
exactly the same output, whether you are using many
to many, many to one. With many to one, we have one. The performance were Tableau going to stop the scan
on the right side. All right, so now let's jump to the next scenario where we're going to do
something wrong. Where we're going to say, okay, the customer ID on the left side is unique and we're going to put the value of one on the right side.
It doesn't matter. Let's have money, for example. Now we are telling
Tableau on the left side, the customer ID is unique, so you don't have to
scan the whole table. And we're going to have the
same example over here. So let's see what's
going to happen. On the left side tableau going to start with
the first customer, say customer ID one. The sum of sales is now 50. Because I don't have to
scan the whole table, it's going to stop at
the first three cords and the output going to be 50. Now on the right sides, once we are saying many here, it doesn't matter the result. We're going to be correct.
We're going to have Maria but table going to scan
the whole table so the performance
is going to be bad. Now we're going to jump
to the next customer. We have the customer
number two table going to have it at
the output here. Again, the same problem
table going to say, okay, we have the sale 20, The
customer ID is unique. We will not find it
again in the same table. I don't have to scan
the whole table. Table Going to
take the value 20, I'm going to put it at the
output without checking the other values here on the right side,
it doesn't matter. We have John, which is correct. But going to scan the whole
table as you can see, if you make mistake here
in the cardinalities, you might have some
problems at the output where we're going to
have some missing data and wrong information. All right, now let's run the last scenario
where we have on the left side one and on
the right side as well one. We're going to get exactly the same output because we have, it's wrong on the left side. The only good thing
here is that on the right side table
going to stop the scan. Once it find a match, it will not scan
the whole table. So at the output we're going to get exactly the
same informations. And here we have one to one. All right, so now let's
quickly summarize. On the left side, we
have two criteria, the correctness and
the performance. Correctness is always way more important than
the performance. Let's start with
the first scenario. We have many, too
many relationships. As you can see, the
output was correct, but the performance
was bad since Tableau doing unnecessary full table
scan on the right side. So that's why I'm going
to give it okay for the correctness and not
okay for the performance. For the next scenario, we have
many to one relationship. The output was okay. So it was correct, we're
going to give it okay. And the performance
was okay since Tableau stops scans once
it find a match. So that's why we're
going to win a lot of performance and we're
going to give it an okay. Let's jump to the third one. We have one too
many relationships. As you can see, the
output was not okay. This was not correct.
We are missing data, so we're going to
give it not correct. And the performance
was bad because on the right side we are
doing unnecessary scans, so that means it was the
worst scenario over here. And then the last one, we
have one to one relationship. The output was not correct. Not okay, but the
performance was okay, since on the right side we are not doing any unnecessary scans. But to be honest, correctness is way more important
than the performance. And that's why tab always
recommend to stay at many, too many relationships if
you are not sure because you're always going to get
correct answers at the output. But if your data is big, you will get some
bad performance. If you want to have
like good performance, you have to invest time
in analyzing your data, doing data profiling
to understand is it, is it one? And then change it. But you have to be
sure about your data, otherwise you will get
wrong informations at your visualizations
and that's really bad. So that means for this example, the safe way to do it, to stay at many to
many relationships, but the professional
one is to have many to one relationships
to get good performance. But this is not
always a scenario. Just imagine we switch the tables between
customers and orders. So customers is left
and others as right. Then one too many relationships going to be the correct one. So be careful here with the
sides. All right everyone. So now let's understand the
integrity options in Tableau. Each relationship has two sides, the left table and
the right table. When we are changing the
settings of the integrity, we limit which joints can
happen in the visualization. So here we have two options, some record match
and a record match. And with that we
have four scenarios. First, we can choose
some record match in both left and right tables. And if we do that, then all
types of joints are possible. In the visualization, we have inner left, right and full join. But now if we choose
all record match on the left and some record
match on the right. So what can happen now? We are limiting the
types of joints to only two types,
inner and right. Join the next one. It can be the opposite, so we have some
record match on the left and all record
match on the right. What can happen
again here we limit the types of joints
to only two types, the inner and left join. In the last scenario,
if we choose all record match on both
sides, the left and the right. Then here we limit Tableau to only one type of
join, the inner join. As you can see, it's
very similar to joints. We are just defining how
Tableau should work. When we use some record match, we allow more types of joins. And when we use the
option or record match, then we are limiting Tableau
with the types of join. And here it's very important to understand that we
have a trade off. If you use or record match
and go down this path, you will likely experience
better performance, but you will increase
the risk of losing data. But if you choose to use some
record match and you go up, you will ensure the completeness
and the flexibility, but you are sacrificing some
resources and performance. Tableau team here decided to
go with the first scenario where we have on the left and the right some record match. I can understand that because
it's more important to have completeness and flexibility
more than performance. Let's have a look
at our data here. We have customers that
didn't order anything. The customer number three didn't order anything over here, and we don't have a match of it. We can say some records matches like the 1.2 are matching
on the left side, but some other records
does not match. We don't have an order from
the customer ID number three. That means in our database, we could have customers in the customer table,
didn't order anything. The correct option over here
is some records matches. Now let's analyze the orders. As you can see, we have the
customer ID number one, we find it in the customers
two as well, and so on. So we can see that
all the records, all the customers IDs in the orders has a match
from the customers. Well, that means we can
select all records match. We don't have, for example, customer ID four over here which does not have a
match on the right side. That means in our database, all orders should come
from our customers and we should not have any order
without a known customer. After the analysis, we can say on the left
side on the orders, we have always a
matching records. So we're going to select
all records matches. But on the right side, we might have customers
that didn't order anything. Then we can say some
records matches. If we do it like this, we
can prevent Tableau from doing any extra stuff
by analyzing the nulls. Like in SQL, if you
have full outer join, you will get like
huge amounts of data. And sometimes if you're using inner join or left
join and so on, you will get better performance. So if you know exactly what
is going on in your data, then select the
correct integrity. Otherwise just leave
it as a default. Some records matches on the left and on the right
you will be safe, you will get correct answers. All right, so a pack to Tableau relationships are really easy. We just have to drag
those two tables and Tableau go and create the
relationships between them. Just get the key between the relationships correct and everything going to be fine, and leave those
staff as a default. But if you want to be like more provisional and get better
performance in Tableau, you have to do data
profiling and then select the correct one
if you are 100% sure. So in this example,
the orders over here has many in
the customer IDs, but we have on the
right side one for the customers and then
for the integrity on the orders or records matches because all orders has a customer ID in the
customer's table. But we might have some customers that didn't order anything. So I'm going to leave it as some records matches and that's it. That is relationships
in Tableau. All right, so that's all about the very important concepts of the relationships
and how it works. Next we will learn
very unique methods, the data blending in Tableau.
83. Tableau | Data Blending: All right, so now let's talk about data blending in Tableau. But first some coffee. Let's go. All right, so now let's
have this example where we have in the
data source table A. And now in the
visualization level we want to use the data
from the field F one. And you know by now
Tableau going to send a query to the data
source in order to get the data
of the F one from the table to show it
in the visualization. Now since this data source was the first one to
be queried and to be used and Tableau
going to call it a primary data
source in Tableau, anything is primary going
to get the blue color. That's why you will
see like blue icon indicates that this data
source is a primary one. Now sometimes you are in a
situation where we want to get the data from
another data source. For example, we have
another data source with the table B and we want to add the visualizations
to show the data of four. What's going to happen?
Tablo going to send another query to the second
data source in order to get the data of four and then the data can be forwarded
to the visualizations here. Tablo going to call this data
sources ondary Data source, and it will market
with an orange icon. Now in order for this to
work where we're going to get data from two
different data sources, we have somehow to
connect them here. Exactly. We're going to use the very unique way
in Tableau where we can connect data sources together using the
data blending. Data blending can
only be done at the visualization level
on the worksheet page, not in the data source. Now you might ask how Tableau is joining those tables at
the visualization level? Well, Tableau is
using a left join. We cannot change that. Sadly, it is fixed. It's like a left joint Tableau going to get all the data from the primary data source and only the matching records from the secrondary data source. Now to summarize,
data blending is the methods of combining data at the visualization levels from two different data sources
using a left join. This is very unique
feature in Tableau. You don't find it in
any other BI tool like Microsoft Power BI. You cannot, for example, there, combine data from two
different published datasets. All right, now let's see how we can do data blending in Tableau. And for this we need
two data sources. The first one going to be from the CSV files that we have, from the small datasets, we're going to go
to the text files. Let's take the
products over here. This is our first data source. Now let's go and create
the second data source. In order to do that, you
can go to this icon over here and then click
on New Data Source. Let's go there. It's going to be from the Json file that
I prepared for you. So let's go to Jason and we have the product prices.
Let's open that. Since it's Jason, we have
to select the schema. Let's go to the data over here. And click Yes, and
then click okay. Now we have two data sources. In order to switch between them, we go again to this
icon over here, and you can see we have
now two data sources, and by just selecting
the data source, you will switch to it. Now in order to do
the data blending and to connect those
two data sources, we cannot do it at
the data source page. We have to go to the
visualization level, to the worksheet page. Let's do that. I'm going to go to the sheet
one over here. As you can see at the data
pane on the left side, we have two data sources and
by just clicking on them, you can switch in order to
see the tables inside them. Now we have to decide
which data source is the primary and which
one is the secondary. For this example,
I will say that the product is the primary one. And how are we going to do that? By just using the data
indivisualizations as the first data source. So I'm just going to
take the product ID, drag and drop it on the rows and immediately Tablo
going to understand. Okay, this is the
primary data source and it's going to market with a blue icon over here indicating that this is
our primary data source. We still don't have a
secondary data source, so you see there is no
orange icon over here, because in our view, we have data only from one data source. Now, in order to get the data from the second data source, we're going to switch
to the product prices. And you can see Tableau
immediately turn this data source as a
secondary data source. You can see over here we have
the orange icon indicating that this is
secondary data source and any field that we are using, it's going to
market with orange. So you can see over
here the price, it has an orange icon
that it's very simple. Now let's say that the
product ID is not the key of order to join those
two data sources. You want to change that. In order to do that,
we're going to go to the Data over
here in the menu, and then go to the Edit
Blind Relationships. Let's click on that. We will
get a new window over here. And here we have two options,
Automatic and custom. If you leave it as
Automatic Tablo going to figure out which key to join those data sources here in this example
is the product ID. If you want to change
that, you can go to the custom over
here. It's like join. You have to specify
from the left and from the right which fields are the key in order
to do the join. If you want to change that,
just double click on it. And then you have
on the left side the primary data source and the right side the
secondary data source. And then you select the fields that are the key for the join. I'm going to leave it as it is. Let's add another key. I will go over here
and for example, the category is from the left side and from the
right side the data index, which is really wrong.
Let's click okay. And then again, okay, you
will see on the left side now we have another chain
on the data index. And you can see it's
like broken chain, that means not yet
used in the joint. If you want to activate it, just click on it and you will see we have
an active chain. Now as you can see,
the result is wrong because it doesn't make
sense to use this key. But I just want to
show you how you can deactivate and activate the key of the joint between two data sources by
just clicking on them. Now let's just correct this. I want to have only
the product ID as the key for the joint. So that means I'm
going to deactivate the data index over
here. And that's it. This is how you can define the
key for the data blending. One thing that is
very important to understand that
everything that's we've done in the
data blending is only relevant for
these worksheets. If I go to another worksheets, let's go over here
and create a new one. Now as you can see over here, it's completely reset
the two data source. We have it again, but
we don't have it as the primary and
secondary data sources. That means in each worksheets
we can make a new decision. At the sheet Number one, the products were the primary. I can change my mind here
where I can say, okay, the product prices now is
the primary data source. If I take anything over here, you can see product
prices is the primary. And if I go to the
products and let's say I'm going to take the
product name over here. Products can be the secondary, so I just switched between them depending on
the requirements. So if we go back to
the sheet number one, we see that the product
is the primary. But if we go to the
sheet number two, the product prices
now is the primary. This is really nice
because it gives us really flexibility
where we can decide in each worksheet which one is the primary and which
one is the secondary. Depending on our requirements, data blending is very
unique and great way on how to connect
and combine data. All right, so with
that, you have now an overview of all four
methods of combining tables. And next we will go and
compare them side by side, and we will start
with the differences between joints and union.
84. Tableau | Join vs Union: All right, so now what is the main difference
between joins and unions? Both of them are very similar. They're going to combine two
tables in one big table. But the difference
here, that's how the data going to be
combined in joins, the fields of both tables
are going to be combined. So we're going to take all
the fields from the left side and beside it, all the
fields from the right sides. So the results, we're going
to get one big wild table. But on the other
hand, in the unions, two tables are going
to be combined. But instead of combining
the fields here, we're going to combine
the rows of both tables. So we will get all the
rows from the first table, and beneath it, all the
rows from the right table. But both of them has
exactly the same columns. So joints combines the fields and union combines the rows. All right, so that was
the main difference between join and union. Next we will learn
the differences between joints and
data blending.
85. Tableau | Join vs Data Blending: All right, so now
the question is, what is the main difference between joints and
data blending? Data blending is
like a lift joint. But the main difference
here is that when the aggregation is going
to be performed in joints, the data combines first and then the
aggregation can happen. But in data blending
is opposites, the aggregation going to happen first and then the data
going to be combined. So now let's have
a simple example in order to understand
what this means. Okay, So again, we
have our tables, customers, and orders. First we're going
to do the left join and afterward we're going to do the data lending between
them in order to understand the differences
between them in the output. All right, so now we're
going to start with the left join, you
know, left joint, all the data from the left side and only the matching
on the right side. We start as usual by combining
the fields from left, the fields from right. We start record by record. We're going to take
the customer number one and we're going to
search for the matches. We have two rows on the orders. That means Marie
going to be twice in the output because
there is two orders. And then we're going
to go to the next one, customer ID number two. We have only one order for that, we're going to have
it at the output and George don't have any orders, so that means we're
going to have null here, here, and here. So as you can see
with the lift join First we combine the data, the raw data, without
doing any aggregations. Afterward, ind
visualizations we can find, for example, the sum of sales
or the average and so on. Now let's check the data
blending, how it works. All right, now let's say
we have all the fields from the primary data
source and beside it, all the fields from the
secondary data source. This is like left joint. We're going to take all the data from the primary data source. We're going to get all the
three customers over here. But the main difference here is that there will be
no doublicates. As you can see, we
have here Maria twice. But in data blending, you
will not get any doublicates. Now here comes the difference. Before we start getting the
data from the orders from the secondary data source
And aggregation can happen. For example, with the
customer ID number one, we have two rows. The two rows will not be
presented at the output first. It's going to be
like an aggregation, and now it's very important
to understand that the fields in Tableau are split between
dimensions and measures. In the next tutorials, I'm going to explain that in details. But now the measures
can be aggregated. The dimensions will
not be aggregated, for example, the customer ID, It is not a measure,
it is a dimension. Tableau cannot aggregate it, but since we have it
twice the same value, Tableau can arrive here one. Then the next one we have
the sales, It is measured. So Tableau can aggregate
fares and then combine it. The sum of that is
going to be 80. Let's two thats the next
one we have the date here. Dimension cannot be like
aggregated since we have two different values going to
write at the output a star. Since Tableau going
to provide at the output only one value
and we have here two values, Tableau will not decide which
one of them going to be. Tableau going to add a star. What's going to happen in
the output going to be star? I know this is really not nice, but this is how data
blending works. As you can see,
Tableau always try to aggregate the data
before combine it. Now let's move to
the next customer. We have John in the orders, we have only one records. That means nothing
going to be aggregated. The output is going to
be exactly the same. Then for the customer George, there is no
information over here. We will get as well nulls. This is the output
of data blending. This is exactly what I mean with the main differences
between joints and blending is when we do the aggregations in the
left joint, as you can see. First we combine the
road data togethers. Afterwards, we can do aggregations
in the visualizations. But in data blending
first the data should be aggregated specially from
the secondary data source. Afterwards, the data going
to be combined in Tableau. All right, with that,
we have learned the main differences between
joints and data blending. Next it's important to one, we will learn the
main differences between joints and
relationships.
86. Tableau | Join vs Relationship: All right, so now what are the main differences between
joints and relationships? If you are using joints, things can get really static and we might lose as
well a lot of data. But if you are using
relationships in our data model, then we will get
more flexibility and we will not lose any data. Now, in order to
understand this, let's check this example. We have prepared
two data sources, one with joints and the
other with relationships. The first one with the orders. If I go to the physical layer, you can see we have a left joint between orders and customers. Let's check the second one. We have the relationships we have as well,
the same tables, we have orders and customers between them, there
is a relationship. Now, if you check our data, we can find that there is a
five customers in the orders. There is only four
customers that did order. If you check over
here the customer ID, you will not find
the ID number five. That means this customer
didn't order anything. This is no problem for
the relationships, but if you go to the joints over here and you check the data, you will see that we don't have a customer ID number
five at all in our data. So you can check, okay, we have 1234 and so on. The customer ID number five
is completely disappeared. That's because we have a lift joint between the
orders and the customers. Only the matching roads from the right sides can be
presented at the final table. That means we lost
this customer. And if we are at the
visualizations, let's go over here. Let's say we want to count how many customers do we
have in our database. Let's drag and drop
the customer ID. Let's turn it to a measure
of count distinct. Our data says, okay, we have four customers. If we go to the relationships, let's open another one and
switch to the relationships. And let's take the customer
ID again over here, switch it to a measure
and count distinct. You will see we
didn't lose the data. We have five customers
in our database, and the relationship is going to give us more correct answers. Now you might say,
okay, we can fix this. If we change the type
of join, that's right. If I go to the data source, then I go to the joins, go to the orders, and I just
switch this to the right. So that means we're going
to get all the data from customers and only
matching from the orders. Let's close this and go back
to our sheet number one. You me close this, we'll see that we
have five customers. So with that we have
correct answer. As well as with the
join here we come to the next point that things
are really not flexible. So that means if I'm
building visualization, where sometimes I'm asking how many customers do we have or how many orders do we have? I cannot each time go to the data source and
change the type of join, because once I decide
it's a lift joint, it's going to stay for all the
worksheets as a lift join, unless I'm doing full outer
join between the two tables. And if you're working
with big tables, then you will get a
very big merged table which can slows everything down. And this is exactly what I mean. If you are using joins, you will lose data if you are using lift joint or right join. And as well, things are really static with
the relationships. If we go to the sheet
number two here, things are more flexible because we didn't
merge anything, the data state separated
from each other, we just describe the
relationships between them. If in worksheets I'm doing
analysis about the customers, it will not affect the next
visualizations if I'm doing analysis about the orders because we didn't lose any data. And I don't have to worry, do we have left join
or right joint? Should we change it and so on. So it's more flexible and we will get always a
correct answers. So that's why joints are static
and you might lose data. But relationships are more flexible and you will
not lose any data. All there is another
issue with the joints, if you compare to
the relationships. Sometimes in joints we might get wrong answers if you are doing calculations
on the measures. Let's take this example
on the customer tables. We have the score
for each customers, we have a score and we
have those five customers. The average of the
score going to be 625. Now let's stick in Tableau that results from joints
and relationships. All right, now we are
at the relationships. And let's take the score and drop it over
here on the text. Then let's find the average. So we're going to go over here, measures and the average
in relationships. We got the correct answer. We have 625. Now let's
check the joints. We are at a data
source of joints. I'm going to score drag
and drop it on the text. And now we're going to switch
as well to average here, we got the wrong results, 585. What happened here? Well, the answer for that is sometime if we merge
two tables together, we might get doublkates.
Let's check the data. If you go to the data
source again in the joins, if we go to the score, we will have doubles. Because some customers
have more than one order, that going to result in a lot of doubles if we merge the
customers and orders, and if you do the average, you will get the wrong answer
as we saw in the results. If you switch to
the relationships, we go to the customers, we see the score over
here on the right side, there is no duplicates and we will get the correct answer. And that's going to
guarantee for us that using relationships we will get correct answers if you
are doing calculations. And that's way better than
having duplicate in our data. We might never get correct
answers from joints. And that's why
Tableau introduced in 2022 relationships just to fix all those problems with the
joints and they made it as the default methods on
how to connect stables. All right, so
that's all for now. And next we will compare
all the four methods side by side in order to
understand the big picture.
87. Tableau | Join vs Relationship vs Union vs Blending: All right, so now we're
going to go and compare the four methods on how to combine data in Tableau unions, joints relationships, and data blending side
by side. So let's go. The first point is in which page in which layer
we can use the method. Now, both union and joints, we can create them at
a data source page, the physical layer, as
will the relationship. We can use it as the
data source page, but in the logical layer. And finally, the data
blending could be used at the visualization level
in the worksheet page. And the next point, can
we use the method in order to connect tables from
different data sources? Well, for union, joints and relationships, we
cannot do that. It should be done in
the same data source. But only the data blending
could be used in order to connect tables from
different data sources. The next point is after
using the methods, are the tables going to be
merged in unions and joints? They're going to merge
the tables and they're going to create
completely new tables. But if we are using
relationships and data blending, they will not create anything. The next point is
about the flexibility. If you are going to
use unions and joints, the decisions that you are
making at the data source can affect all the worksheets
and the visualizations. But if you are using
relationships and data blending, you have way more flexibility. For example, in
the data blending, you can decide on
each worksheet page. Now if you are talking about
the joint types in joints, we have inner left, right, and full in the relationships
we can have as well. Exactly the same
behavior as joints, but in data blending it is
fixed. We have only left. Join the next point. If you ask me to rank these methods I would
say and Tableau as well. Going to say always
use relationships. And after that comes
the data blending. It is really great way on
how to combine tables from different data sources and
the flexibility that we have. And then the third
one I'm going to say the joints I would
not try union because it's completely different
than the methods of joining relationships and data blending always try to go with
the relationships. Now let's see the big picture on how those four methods works. And let's start with joints. They're going to
connect two tables at the physical layer and
they're going to create completely new logical table in the logical layer
where it's going to combine the fields
of both tables. And then at the
visualization layer, the datasets going to create
query at the data source and data source going to get the data from the logical table. And same thing for the union. You can create it at the
physical layer of two tables. And they're going to create
as well completely new table where the rows of both
tables can be combined and add the visualizations table
going to send query to the data source and
the data source going to get the data
from the logical layer. Now to the third method
of the relationships. We have two tables at
the logical layer, and Tableau will not
combine or create anything. We are just describing the
relationship between A and B. At the visualization level, Tableau can ask the
data source and the data source going to get the data from the
separate tables. And finally, the data blending. We have two data sources. The first one going to be
called the primary data source. The second one is the
secondary data source. So first table going
to send a query to the primary data source and then another query to the
secondary data source. Here it's important
that the aggregation is going to happen before
the data is combined. And we are combining the data at the visualization level
using data blending. So as you can see, joints and union happen in the
physical layer. In the logical layer we
can do relationships and at the visualization level
we can do data blending. All right, Kay, so with that, you have learned
everything that you need about combining
tables in Tableau. And next we're going to practice where we're
going to create two data sources using the new skills that
you have just learned.
88. Tableau | Build 2x Data Sources: All right. Okay, so now we're
going to create together two data sources because
we have two datasets, the big one and the small one. During that, I want to show
you how I usually make decisions on when to use
which methods. Let's go. Okay guys, now let's close
everything and start from the scratch in order to get the data source
correctly created. Let's start Tableau public. We're going to create now
the small data source on top of our small dataset. Let's go to the connectors on the left side and
click on Text File. And then it doesn't matter
which one you're going to use. Let's take the orders open. I will delete it anyway, in order to explain how I start. Previously, I showed you the
data model of our datasets. We have star schema where we
have facts and dimensions. I always start with
the fact table. Doesn't matter
whether you are using star schema or snowflake. Always start with
the fact table. Our fact table is orders. Let's just drag and drop it
here on the logical layer. And then I continue
with the dimensions, so we have customers
and products. Let's start with the customers. Just drag and drop
somewhere over here. And Tableau going to create a relationship between
the orders and customers. Since we are talking about
two different entities, so we have orders and customers, I always use relationships
between them. Let's check the relationships whether everything is correct. So we go over here
on the meta data. We see the customer ID from. Lift the customer ID from
right, which is correct. And now let's go to the
performance options. I will change only
the cardinality. If the quality of
our data is bad and we haven't done
any data profiling, then the pace is to leave
it as default to many, some record matches on the
left and on the right. But in the datasets we
already checked that. So we have clean star schema
and always on the fact side, on the left side over
here it's going to stay as many and all the
dimensions on the right side, like customers, it's going to be one because
we have usually, for example, unique customers
or unique products. So I will go and change that on the right side as one because it is dimension side and on the fact side it's
going to stay as many. I will not touch those
integrity stuff, so we're going to
leave it as it is. And that's it. We have now the customers and the orders
connected to each other. Now before we continue
building our data model, we have to check
something very important. Are we working on
the correct datasets in the correct format? So now if you go to the
orders over here and here we have some few fields
like the sales quantity, discount, profits, all those informations should
be in number. And you can check that
by checking the icons, the data type icons. And if they are like this
hash value over here green. If you click on it table going to say it is number, decimal. If you see it like this number, decimal or number, then
everything is fine. But if you see it as a
string, for example, if you go over here and
switch it to a string, if you see this field as a string, there is
something wrong. If your data is like ABC, then you are working
with the wrong dataset. It's not correct, you should
see it like a number. Now the question is why it's
wrong? Why it's not correct? Why Tableau didn't
find it as a number? Well, there is different
representations of the decimal separator
in decimal numbers. Some countries, like in Europe, we have a coma, but in
many other countries, like in USA, in Asia, we have a dot between the decimal number
and the whole number. So now for example,
I'm now in Germany and my data is
separated with a dot. What can have been Tableau
will not understand this is a decimal number and it's
going to show it as a string. And that's why in the
download link I have prepared two datasets
depend on your location. The Europe training datasets and the non Europe
training datasets. The Europe training datasets, all decimal numbers
are separated with coma and for all
other countries, they are separated with a dot
for the first downloader. So now the question
is how to fix it? Well, go and download the correct training dataset
there in order to fix it. For example, now I have
the Non Europe dataset. And as you can see,
the discount sales, profit, everything is wrong, everything ABC and string. Now some of you think,
okay, it's really easy fix. I can go to the data
type over here and switch it from string
to a number decimal. Once I do that, what's
going to happen? Everything going to be null. It will not work because Tableau don't know how to convert
those numbers correctly. Let's move it back to a string
in order to see the data. Again, there is a fix for that. If you go to the orders over here and then rightly connect. And let's go to the
text file properties. Here we have
different properties about the files
like the separator, here we have a semicolon
Tableau did de correctly, but what's more
important than this is the format of the decimon
number, the local. Here we have to
choose a locale which is matching to the
current format. The current format is a
dot here in this example. So what we're going
to do, we're going to go over here and search for, for example, United States. And as you can see,
Tablo can understand the correct format and everything going to be
changed to a number. The solution, either you can
use the correct datasets or you can go and configure the
properties of each file. So I would say you can go
and try United States or Germany until you have
the data type number. So make sure that's
in the orders, all those informations
is the data type number. All right, so now
let's go and keep building our data model
in the data source. Let's go to the next dimension. We have the products, All what we're going
to do is just drag and drop and they release it. Tablo going to create another
relationship between them. Let's check that again. So click on that,
go to the Metadata. Scroll up Tableau did automatically find the
key for the relationship, it is the product ID, which is correct. And now
the same thing. We're going to go
to the Performance Options on the left side, on the fact side it's
going to stay as many and on the right side
it's going to be one. On the right side we
have the dimension, it's going to be one. You can check that easily. If you click on the products
and here check the data, you can see the product
ID is a unique field, there is no doublicate inside it and we can go and use one. If you are not
sure, just leave it as many to many relationship. Let's go again to
the relationship. We have it many to
one and I'm going to leave it here as some
recurse matches. No problem. Now let's go to
the other tables. We have here the
customer's details. And here we have two options. Either we're going to use
relationships or joints. You can go over here
and just drag and drop, put it near the customers
as a relationship. But to be honest
in data moduling, if I have two objects
about the same entity, here we have customers and here another information
about the customers. I tend to merge those
two tables in one. This is different than talking about the orders and customers. They are completely
different entities and usually in data
warehouses I prepare this step in the
database or we can stay tableau and merge those
two tables into one. And we can do that using joints. What I'm going to do,
I'm just going to remove the customer's details away and then we're
going to go to the physical layer
inside the customers. Then we're going to take
the customer's details and drop it over here. Table as default,
going to leave it as inner join, but to be honest, the customer's table is for me, the main table
about the customers and customer details is
like secondary table. In order to not lose
anything from the left side, I'm going to change the
type of joint to left join. Let's do that. I'm
going to click on the icon and then
select Left Join. Then we can check the results. Well, the main thing
that we don't get doublkates or we don't
lose any customers. As you can see, the outputs, we have our five customers. There is no duplicates and
we didn't lose anything. Let's go back to
the logical layer, just going to close this. As you can see, we
have list tables and we have one entity
called customers. We don't have a lot of tables, and I usually do that if we have a lot of tables
about the same topic. Now let's go to the next table. We have the order achieved. And here we have
the same situation. We have two tables describing the same entity, the orders. But of course, we
can connect it as a relationships to the orders. But again, I like to minimize the number
of tables that I'm dealing with and I'm going to go and merge those
two tables together. So here we have again two
options, unions or joints. If the tables has exactly
the same number of columns and the same data
types, we can use union. In order to do that, we
have to do data profiling. Either you open the CSV files
and compare them together, or we can go over here. There is like small
icon, like a table. And if you click on it, Tablo going to show you a
sample of data in order to do data
profiling and to understand the content
of this table, let's just make it bigger. We have the order date,
shipping date, customer ID, product ID, as well the
unit price, and so on. And you can compare it
to the orders over here. Let's just make it bigger. We can find exactly the
same number of fields, the same content,
the same data types. That means we can go and
do union between them. In order to do that,
I'm just going to close this and go to the physical
layer inside the orders. I like to drag and drop
just beneath it over here. Now you can see we have a union, let's check that on the right
sides in the table names. So we have orders and we
have orders achieved. With that, we combine both of the tables in one logical table. Let's close this.
As you can see, we have the icon that there
is inside it a union. And with that we have
only three tables. Instead of having five tables, it is just easier at
the visualizations to deal with three tables
instead of five tables, and the data model is much easier to understand
and to explain. With that, we have connected
all the files together, but we still have one file, the adjacent file
product prices. Sadly, we cannot connect
it with the others in the same data source because
it is different file type. But we still can connect
it to them if we create a second data source
and use data blending. Now that says we have our
fact table and the dimension. We're going to give it a name. I'm going to call it
small data source. Now you can pass the video and go and create the
big data source. If we are done, I'm
going to go and create the big data source. I'm going to go over
here, new data source. Going to click on the text file. I will just go back
to the big one here. We have only the three. We start with the orders, we start with the fact table and then we take the dimensions. Let's take the
customers, customers. I already checked all those
IDs. They are unique. So I can go to the relationships
over here and change it to one on the right side and on the fact side it's
going to stay as many. The same going to do for the
products, drag and drop. All the IDs of the
products are unique. We can go to the performance
option just to make sure we select the relationship
and select one. I'm just going to call
it big data source Now in order not to lose those data sources
in Tableau public, we have to publish to
our public account. I will go and do
that. We're going to go to the sheets over here. Let's just take something like the customers drag and drop on the rows that I
will just go over here and publish it
safe to Tableau public. And I have to sign in, I'm going to call it data
sources then safe. Now it's start publishing to our profile that says if you
want to download the file, you can go over here and
download Tableau workbook. All right, with that we have created two data sources on top of our datasets and we can use them in the
whole tutorial. All right, with that, you have learned
everything about the Tableau data moduling in data sources and how to combine tables using
the four methods. In the next section,
we will start talking about the
data in Tableau. We will learn there are many
important Tableau concepts for data visualizations.
89. Tableau | Section: Tableau Metadata: The meta data of Tableau. Understanding the Tableau
metadata concepts like data types, measures,
dimensions discrete, continuous is very
important in order to build a correct data
visualizations in Tableau, and as well can help
you to understand how Tableau works
with your data. First, I'm going
to introduce you to the meta data in Tableau to learn what
happens to your data once you connect it to Tableau. Next, we're going to dive into
all data types in Tableau, like integers, strain
date, and so on. And after that, we're
going to learn about the data type rules like the geographic rule
and the image role. And after that,
we're going to cover very important
concepts in Tableau. We have dimensions, measures,
discrete and continuous. And of course, in
order to understand the differences between them, we're going to compare them side by side in order to understand. So now let's start
with the first topic where we can have an overview of the basic concepts of meta data in Tableau.
So now let's go.
90. Tableau | Introduction to Metadata: All right, so now
we're going to have a quick introduction to
the Tableau metadata in the data sources in order
to understand what's going to happen to our data once we connect it to Tableau. After connecting our
data to Tableau and building the data model
in the data sources, the next step is to check the metadata of the
tables and the fields. Because once you connect
your data to Tableau, Tableau can start analyzing the content of your data to make assumptions about the types and roles of each field
in the data source. Tableau can assign each field to types like integer,
string, date, and so on. Data types gives us
information about the kind of data stored
inside our datasets. This piece of information
is very helpful for Tableau in order to understand how to
deal with your data. Which rules operations
calculations can be performed. One more thing that
Tableau going to do is going to assign each
field to a role. These roles can help Tableau
building the visualizations. The first set of roles we
have dimensions and measures. Dimension fields define the
level of details of the view. And the fields with
the role measure going to be used for
aggregations in the view, we have another set of
roles, we have discrete continuous. These rules can help Tableau by
plotting the visuals. Discrete fields can break
the view to separate values. And the fields with the
continuous rules going to plot unbroken chain and
connected values in the view. And I call all those
informations about your field as a metadata in the
Tableau data source. One more thing that I
want to tell you is that those assumptions that
Tableau makes about your field is correct
around 90% That means there is a
possibility that those assumptions from
Tableau are wrong. That's why it's very important after you build the
data model is to have a double check on the
meta data to check that all the informations
are assigned correctly. Otherwise you're going to have bad quality and bad results
at the visualizations. All right, so next we're
going to do a deep dive into these important
concepts in order to understand them and the
differences between them. All right, so that was
a quick introduction to the meta data in Tableau. Next we will dive into the basic data types in
Tableau like integer, string, date, and so on.
91. Tableau | Data Types: All right, so we can find data
types not only in Tableau, but in all programming
languages. But they don't support
exactly the same data types. And that's why if
you are learning new programming language or
an application like Tableau, it's very important
to understand which data types they support. Now the question is,
what is a data type? The data type give
us information about the kind of information
stored inside our data. And this piece of information is very important for
programming languages and applications like
Tableau in order to understand how to
deal with your data. Which rules, operations, and calculations could be
performed on top of your data. Now, if you look
closely to our data, you can see that each field in our data source must be assigned to a small
icon or a simple. Those icons indicates the
data types of each field. Now, one more thing, once we
connect our data to Tableau, Tableau can analyze
our data in order to assign automatically the correct
data type to our fields. Well, most of the times,
Tableau does it correctly, but sometimes things go
wrong or you want to change the data type of specific field, this
is really easy. Either you can do it on the worksheet page or at
the data source page, you will get exactly
the same effect. Let's go to the
data source page. Let's go to the orders. And click on the icon over here, you can see it's number hole. We can change it to a string. What we're going to do, we just click on the string
and that's it. We just change the data
type of the order ID. But let's say we
want to change it back as Tableau did
it at the start. What we're going to
do, we're going to go to the icon over here again, and then we go to the defaults. It's back to the
original data type that Tabloadd assign at
the start here. One more thing to notice
that the data types are really sensitive in the
joints and the relationships. For example, if we go
to this relationship over here between the
orders and the customers, the key is the customer ID. Those keys should have
exactly the same data type. Let's say we go to the orders, and let's change the customer
ID from number to string. We're going to go to
the string over here and we change it immediately. You can say at the data model, the relationship
between the orders and customers is now broken. You can see at the tool tip, it's going to say type mismatch
between the customer ID, the string, and the
customer ID number. As you can see now,
Tableau is very sensitive with the
data type of the key, whether you are
using relationships, joints, data blending
doesn't matter. They should have exactly
the same data type. Now in order to correct it, as you can see, we don't
have any more the data. Review the data grid, how we can change
now the data type. We're going to go to
the metadata grid. We're going to do
the same thing. We're going to go
to the customer ID. Just click on the data type icon and change it back to
default or to number. I'm just going to
click on Defaults and Tableau going
to be happy now, and the tables are
related again, The third way to
change the data types, you can go to the
worksheet page. And same thing over here. You can go to the icons
and change the data type. As you can see,
it's really easy. In Tableau we have a bunch of different datatypes that's we're going to cover in this tutorial. And I group them into
three categories. First we have basic
main six data types. We have the number hole number, decimal string, date, date
and time and bullion. The second group, we have roles. We have geographic
roles and image roles. And the last group, we have advanced data types like group, cluster, group benz, and set. This group contains
special data types that's introduced from Tableau
for data visualizations. And they are specially made in order to organize our data. In this tutorial, we're going to focus on the first two groups, the Basic and the role for
the advanced data types. I'm going to dedicate
another full tutorial just speaking about them. All right, now let's start
with the first group, the basic data types, where we're going to
do deep dives into each type in order
to understand them. Let's go all right, so now we're going to talk
about the data type number. If our data contains
only number, nothing else it contains digits 0-9 then we can call
it a number data type. And it's very important
to understand that numbers cannot
contain any characters. For example, let's
say that we have the following phone number in our data, this type of data. We cannot call it a number because it contains characters. We have the minus,
we have the plus, because the number data
type can only have digits 0-9 Now if we remove those characters
from the phone number, then it's going to
look like this. And only now we can give it the data type number in Tableau. The data type number
has this icon. It's like a hash for numbers, we have two data
types in Tableau, we have number hale
and number decimal. So what is the
difference between them? You know, in math, a
positive or negative number could be splitted by dots. The first part we call
it a whole number, and the second part
we call it idcimal. If your number does not include decimal dots or any fractions, then we can call
it a whole number. Like three -100 zero and so on. But if your number contain
dots and fractions, then we call it a
decimal number like 2.4 or 13.99 And here, you need to be careful
which one you are using, especially if you are making
calculations in Tableau. For example, if you want
to divide two numbers like 1/2 if the output field has
the data type whole number, then the result
going to be zero. But if it has the data
type number decimal, then the result
going to be correct 0.5 and this is exactly the difference
between those two data types. All right, so now let's
check our fields in Tableau to find out which one has
the data type number. And I would say, let's
check the orders over here. You can see we
have the order ID, customer ID, product ID. By just checking them, you can find that all
of them are numbers, they don't have characters and
they don't have fractions. That means they should have
the data type number hole. As you can see, all of
them is number hole. Let's check another
fields on the right side. We have here sales, we
have discount, profit. As you can see, they
have fractions. Those numbers should be a number decimal.
Let's check that. You can see Tableau
did automatically figure out that those
numbers are number decimal, but for the quantity it's
whole because we don't have here any fractions that
sets, everything is fine. All right, now
we're going to talk about the data type string. The string datatype is one of the most widely used datatype in all programming languages. A string datatype is a
sequence of characters, and it could include
anything like letters, numbers pass, and any
other type of characters. You can think of string
as a plain text. And any field in our data
source could be a string. String is like a default
data type and it has no rules or whatever
like the other data types. So that means you can
convert any fields in your data source to a string datatype
without any problem. And Tableu as well uses the
string data type when it couldn't find any suitable other datatype for your fields. Let's check in our
datasets where we can find fields with the
data type string. Let's check first the products. Over here, you can see we
have here two strings, the product name
and the category. In the product name,
we have characters, we have spaces, we have numbers. Those are the data type string. Let's check the customers. Over here, we have
the first name, last name, both of
them are string. But now you might notice
or ask, you know what, we have city and country, both of them
contains characters. Why don't we have the icon
of ABC? Is it like string? Well, the answer is yes, because if you just
click on the icon, you can see that Tableau
did assign it to a string. But here the difference is
that they have an extra role. We have the geographical rule. And you can see Tableau did
assign it to a country. Here, Tableau going to give
it another icon just to indicate that this field
has a geographic role. But the basic, the main
datatype for that is a string and the same
is for the city. Okay, now we're going
to talk about one of the most confusing data
type. It is the date. If your field stores information
about the calendar data, then this field is going to
have the data type dates. Dates have very different
formats in different countries. For example, in Germany, we have the following
date formats. You see we use dots
instead of slashes, but date in the
international formats follow another rule where the date
going to split it by a minus. And in the world there are
many, many different formats. So those dates follow specific formats and we describe it with the
following codes. For example, for
the international formats we have this code. It's going to start
with the year. And the year has four digits, that's why we have four times Y. Then we have a minus
and two digits. For the mansus, we have M minus two digits
for the day, DD. So there is like a code for each part of the dates we have, the day, months, year,
weeks, and so on. In this table, I'm
going to leave the link in the description. You can find all those codes and the descriptions of that. With that, you can customize
the date format as it suits. You don't worry about it. Tableau understand
almost all date formats that we have in our data. We could have not only
the calendar data, but also informations
about the time. Then we have Tableau,
another data type for thats, we call it date and time. And in programming
languages or databases, you might hear it already
about the time stamp, But Tableau, we call it date and time. It might look like this. We have the date, then space, and then afterwards we have
informations about the hour, the minute ant, seconds
like the dates, it could have as well
different formats. You could have the li seconds, the time zone, and
many other stuff. So here we have again a table of all the codes for the
time informations. You can find it as
well on the same link. All right, so now let's
check our data to find out which fields has
the data type date, usually in a star
schema data model. All the dates are placed
at the fact table and our fact table is the
orders. Let's check that. You can see we have two fields with the data type icon dates. We have the shipping
date and the order date. It's not date and time because
we don't have in the data. Information about the time. So both fields are dates, we can check here and as well here and in
the other tables, broad acts and customers, they don't have any dates or times because they
are dimensions, they are not events and usually don't have any
information about the date. All right, so now
let's go back to our orders, to our two fields. And as you can see,
the format here is that they are splitted
with slashes. Let's say that you don't want this format, you
want something else. So now how we can change
the date format in Tableau? In order to do that, we have
to go to the worksheet page. So let's go to the
worksheet page over here. And now you have to
decide something. Do I want to change
the date format for the whole workbook, for the all visualizations? That means you are changing the default format of the date. Or you want to change the
format only for this view. Only for one visualization. Let me show you how
you can do both. Now let's put
something at our view. I'm going to take the order ID, drag and drop it over here. Let's work with the order date. I'm going to drag and
drop this on the Tableau. Going to show it as a year. I want the exact date in
order to see the format. So as you can see, our date
has the following format. Now I want to change the default date format for
the whole workbook. In order to do that,
we're going to go to the left side to the
order date right click. Then we go to the
Default Properties, and here you can find
the date format. If you click on that automatic, it is what Tableau did
figure out at the start. And then we have some
predefined format from Tableau. What is interesting
is at the end we have custom our new format for the date can
split with the dots. And the year going to
have only two digits. The code format going
to be like this, D, D for day, then dots,
M, M for month. For the year we're going
to have only two digits. That's going to be Y, Y twice. Let's hit, okay. As you can see, Taba did change the
date format in Tableau. Now let's go and duplicate
this worksheet over here, Piratical kicking on it. And then duplicate, as you can see in the next
worksheet as well, we have exactly the same
format that we defined. This means that
the format that we defined is a default now
for the whole workbook. But now let's say that
I want to change it only locally at
one visualization. I don't want to change the
default format for the date. Let's dublicate that
as well once again. Now, instead of going
to the left side, we're going to stay at the view and we're going to
go to our fields right click on it and then we go to this one here, format. Once you do this
on the left side, the data being going to
switch to the format spin Over here on the left
side you can see dates. If you click on
that, we're going to get exactly the
same stuff over here. Those are the predefined
from Tableau. We have the automatic
at the top, and at the bottom
we have the custom. Now let's choose one
of those predefined. I'm going to take the
week and the year. Let's click on that.
As you can see, Tableau did change at the
date format in this view. Now interesting to
check the other sheets whether the date
format did change. Let's go back to
the previous sheets and see the state at the
default format of the date. With this, you learn how
to customize the format of the date for specific view
or for the whole work work. But now I want to change
the date format as before. In order to do that,
I'm going to go over here, close this format. Then go to the Order Date again, right click Default
Properties Date format, and then we just click
on the Automatic and hit Ok. As you can see, we have again the
same old format. That's it, this is how we can work with the
data type date. All right, now
we're going to talk about the last data type in the basic category,
the Pullion datatype. The Pollan data type
represent a fields that has only two values,
true or false. It's like the
language of computer, we have only 1.0
This datatype is often used in the output
of a condition or logic. For example, if I ask you, do you like this video so far, the answer is going
to be yes or no. If you like this video,
please give it alike. The answer for this question, Can has the data type
pull either yes or no, true or false, and
no, any other values? And don't forget to subscribe the pulling datatypes
has many use cases. For example, control the
workflow of something. If the output is true,
then do something. If false, then do
something else. All right, so now let's
check whether we can find any pulling datatype
in our orders. We can check over here, we don't have any
pulling data type and the customers as well. Nothing. And in the products, well, we don't have any field
with the bullion datatype. Well, usually data type bullion going to be add once we use conditions in Tableau and once we create new calculated fields. Now to create the
calculated field, we're going to go to
the worksheet page. We're going to go
sheet number one. Now make sure to select
the small data source. Then we go to this
small icon over here. And now we select Create
Calculated Field. So let's click on that. We will get a new window to write our expression
or our condition. I'm going to give it
the name of logic 400. And now what we're
going to check, or what is our condition? If the sales is
smaller than 400, then it should be true,
otherwise going to be false. The logic is very simple. So here we're going to find
the sales smaller than 400, and that's if the sales is smaller than 400, it's
going to be true. Otherwise it's
going to be false. Let's click Ok. And
once you do that, you can find on the left side we have a new field
called Logic 400. It has the data type volume. The output has only two
values, true and false. Let's validate that.
I'm just going to drag and drop this on
the view over here. As you can see, we have
only false and true. Let's see whether the
logic is working. So we're going to
take the order ID and just put it before it. Now we need the sales. So we're going to
take the sales, drag and drop it
here on the ABC. Here you can see, for
example, the first order, it is smaller than 400, that means the logic
is true, correct. And then the next
one, it is above 400, it's false. And so on. We can see if the field
has only two values, true and false, then the
datatype going to be bullion. And we usually use it as
an output of a condition. And the bullion datatype
has a lot of use cases. For example, if you want
to filter our data, anything above 400, we don't want to see it in
our visualizations. So what we can do, we can
use the logic in the filter, Just track and drop
that on the filters. And we're going to
select only the true. So I'm going to unmark the
false and then hit, okay. As you can see, the
result can show only the orders with the
sales less than 400. And with that we just filter
our data very easily. All right, so with
that, we have covered the basic six data
types in Tableau. Now let's do a quick recap. We have the number
hole is for fields that stores only numbers
without characters, and those numbers are without
fractions or decimal dots. The number is as well for fields that have only numbers
without characters, but those numbers could have
fractions or decimal dots. String is a sequence
of any characters. It could be numbers, letters, special characters, or
spaces. Then we have date. Date is for fields that stores informations about
the calendar dates. Next we have the
date and time is as well for fields that stores informations about the calendar and as well about the time. And it has as well
specific formats. And the last time we
have the bullion, it can store only two values, false or true, and we usually
use it for conditions. All right, so so
far we have learned the basic data types in Tableau. And next we will learn
the two data type roles, geographic and image roles.
92. Tableau | Data Type Roles: Okay guys, so the first
role that we're going to talk about is the
geographic role. If you have in your data
field that contains location informations
or geographical areas, then you can assign it
to a geographical role in Tableau based on the
type of the location, such as city, country, postal code, and so on. Assigning this
extra role can help Tableau to plot your
data correctly. If you are using map
visualizations in Tableau, there are over 12
geographic roles, but I think the
most important ones are city and zip code. Now let's check our data, but first, some coffee. Let's go, All right,
back to our data source. Let's go to the
customer's table. There we have some information about the location
of the customers. Here we have three fields. We have Country, City,
and Postal Code. Now in order to check
the geographic role, just click on the icon over
here on the data type. Again, here it's very
important to understand. Each field must have
a basic data type. For example, the postal
code is a number hole. Then we assign an
extra role for it. Having the geographic role will not remove the
number data type. Now let's check the
geographic role over here. And you can see that
assign it to anything. It stays here. None. This is a zip code or postcode, so we're
going to correct that. We're going to just
click on this over here to assign a
geographic role. And you can see the
icon did change. With that, we have the
data type number and we assigned a geographic role for it. Let's check the others. This should be a,
let's click over here. The basic data type is a string because we
have characters. And let's check the
geographic role. Tableau did it correctly, We have it as a city.
That is correct. Let's go to the
country over here. We have it as a string and then the geographic
role is country. With that, we have all
location informations assigned correctly to
the geographic role. We can start building a map
visualizations in Tableau. Let me show you an example. Let's go to the sheet
number one over here. What we can do, we can go
to the customers over here. And let's take the
location information. Let's take the county, the city. Let's have one metric. I'm going to take the sales, drag and drop it over
here on the ABC. As you can see,
it's only a table. We want to switch it to a map. In order to do that,
go to the Show Me over here and then
click on the map. You can see Tableau did
correctly plot our data. Let me just close it and assign for each
country the metrix. This is done because we assigned our data to a geographic role. All right, so now let's
talk about the other one. We have the image role. This is brand new Tableau
just introduced that in 2022. In principle, if
your field stores a URL's pointing to images, then you can assign this
field to image role with the URL to show the images
in the visualizations. And Tableau have here
some requirements. So the first one,
Tableau supports only those three
image extensions, and the URL should begin with the HTTB or HTTBS requirement. The maximum number of images
in each field is 500, and then we have the image size. It should be less
than 128 kilopytes. But though things might
change in the time, since it's completely
new feature in Tableau. And I think the most
used case for this is to show the product images
in your visualizations. All right, so now let's
see an example in Tableau about the image role
in our datasets. I have prepared some URL's
inside the table products, but only in the small
datasets. So let's check that. If you go to the
products over here we have a field called
product images, and here we have URLs pointing
to images in my website. Now let's check the data type. Over here, it is a
data type string. This is the basic one, because a URL is a
sequence of characters. And now we can add on top of this basic data
type an image role. And it's really easy,
we just go over here to the image role and we click
on the URL. So let's do that. And with that we
have a new icon, indicates that this field
has the role of image. Let's check the
data. We're going to go to the sheet number one. Then we go to the products, make sure we are selecting
the small data source. Then we go to the
products image. Just drag and drop over here. And as you can see now we have some images about the products, but two of them are broken. And I think it's
still bagging at the disto version
of Tableau Public. Because if we publish now to
Tableau Public in the Whip, we're going to have all
the icons correctly. So now we can go and
grab another field. Let's take the sales, drag and drop it over here. And with that, we have
nice images to the matrix. Let's go and publish
that in Tableau public. I'm going to call it View Image. Let's save as you can see now in Tableau
Public we have all icons, nothing is broken. I think if you are building dashboards about the products, it's really nice
to show the image of the product
instead of the names. It's just more catchy to have images inside the
visualizations. All right, so that's
all for the data types. Next we will learn very
important concepts, the dimension and measure
roles in Tableau.
93. Tableau | Dimensions vs Measures: Dimensions and
measures in Tableau. So once we connect
our data to Tableau, Tableau and analyze our data
in order to assign each of our fields to either a dimension or measure this
kind of meta data. Going to help Tableau to
blot our visualizations. All right, so now
the question is, what is dimensions and measures? Well, Tableau didn't invent the concept of
dimensions and measures. It is an old concept of PI. And now we're going to
have a quick origin story. If you learn the concepts of datawarehusing and
business intelligence, you might already know
that the core concept is the multi dimensional op,
online analytical processing. The concept says, if
you want to answer the business questions or
do data analysis first we have to build a
data model that has the shape of a cube
with multidimensions. It's something like this cube. And each cube has
two informations. First we have the
dimensions of the cube, and the second information
we have those cells, those cells can store
informations like data, numbers, and we
call it measures. Each cube has two informations, the dimensions and the
cells, the measures. Now let's have an example. We have the cube of sales
and it has three dimensions. The first dimension
is the locations. And inside the locations, we have three members, USA, France, and Germany. Those three values are the member of the
dimensional location. And we have another
dimension called time. And it has three members
in the dimension, January, February, and March. And the third dimension,
we have the categories. Now, inside the
sales of the cube, we have the Mejor Sales. Now our cube is ready
with the dimensions and measure and we can start answering the
business questions. For example, find
the total sales in USA. What can happen? We can select the
dimensional location and filter the dimension to
have only the member USA. This operation in the cube, we call it slicing the cube. And then we can
aggregate them, measure, and we will get the
total sales of 120. And if you have cube, we can do multiple operations
like slicing, dicing, roll up, drill
down, and be fought. So if you have such a cube, we can do data analyses and find fast answers to
the business questions. Now to summarize, dimensions
contain qualitative values. They usually describe something
like the product name, the broaduct category,
customer location. And we use dimensions
to categorize, filter, and show the
level of details. And on the other hand,
we have the measures. They contain numeric
quantitative values that can be measured
like the name says. And the measures,
unlike the dimensions, they can be aggregated. All right, so this might
be still confusing. And if you say, you know what? If I look to my data, how do I decide whether it's
a dimension or a measure? So here is my decision
making process. First I check the data
type of the field, whether it is a number. If the answer is no, then
this field is a dimension. But if the answer is yes, then we can ask
the next question. Does it make sense to aggregate
the values of the field, like doing the sum
calculation on the values or finding
the average value? If the answer is yes, then it is a measure. But if the answer is no, then it is a dimension. So what this means, all
nonumeric fields are dimensions, all numeric fields are measures. That really depends
on the questions whether it makes sense
to aggregate the values. If yes, then it is a measure. If no, then it's dimension. Okay, so now let's practice. In order to understand
the concept of dimensions and measures
and how they work. We will check our datasets
and we're going to assign each field to either
dimension or measure. We're going to do the
table customers together. And then you can go
and bowse the video in order to do the
products and the orders. And then at the end, we're going to check
the result together. So let's go, we're
going to start with the first field,
the customer ID. The customer ID is a number, so we cannot say it is automatically a
dimension to jump to. The next question now, does it make sense
to aggregate it? Well, we have here
to understand that the customer ID is a unique
identifier for the customers. For example, Maria has the customer ID number
one, Martin has four. And now if we sum
all those values, we're going to get
the value of 15. Or if we do the average, we're going to get
the value of three. Those values don't make
any sense because we use the customer ID only to
identify the customers. And I don't think
that we will be in a situation where
we have to find the average of the
unique identifiers since it makes no sense. This field is a
dimension and with that, we can assign the customer
ID to a dimension. Now let's go to the next one. It is much easier
because we have here the first name and
it is not numeric, so it is automatically
dimension. The same goes for the last name. It is as well string. It is not a number. All right, so now let's move
to the next one. We have the postcode or the
zip code. It is a number. So we can ask the question, does it make sense to
do aggregation here? Well, I don't think
there will be a situation where
we have to find the sum of the postcode or
to find the average of it. So that means it is here again, it's a number, but
it is a dimension, so let's assign the
value for that. And then the next
one, it is easy, so we have the city
and the country. Both of those values are string, so it is automatically
a dimension. So let's assign it again. Let's move to the last field. We have the score here. It's again a number we
can ask the question, does it make sense here
to do aggregations? Well, the answer is yes. It's really makes sense to
find the average of the score. That's why we're going
to map it to a measure. On the table customers, we have six dimensions
and only one measure. Now you can go and pause
the video in order to practice with the table orders and as well with the products. All right, now let's
check the results. As you can see in
the table orders, we have a lot of measures
because it is a fact table. And fact tables in the star schema is the central
place for the measures. This is very normal.
Let's check the fields. We have the order ID,
customer ID, product ID. It is like the customer ID. Those are identifiers and it doesn't make sense
to aggregate it. That's why we have
it as dimensions. The order date and
shipping date. Those informations are not numeric and that's
means it is dimension. And then we have all
those informations. The sales quantity,
discount, profit, unit prices, all those
fields are numbers. Here it makes sense to do aggregations like the
sum or the average. We're going to use the orders, the fact table if we
need any measure. Let's go to the next one,
to the products here. This one is easy, the product ID is like, again, the identifier. It doesn't make sense
to do an aggregation. We can have it as dimensions, product name, and category. Both of those
informations are string, they are non numeric, and that's why they
are dimensions. I hope with this you have
understood how I usually do it. By just looking at the data, we could decide whether it's
a dimension or measure. All right, so now
back to Tableau and the first question is, where do I find in
Tableau whether my fields are measures
or dimensions? Well, there is no icons for
dimensions and measures, and as well, we cannot check that at the data source page. In order to check the
dimensions and measures, we have to go to
the worksheet page. Let's go to sheet number one. And then we're going
to go to the data Bain on the left side over here. Let's open any table, for example, the orders. Now if you look closely
to the table orders, you will find like fine
gray horizontal line which splits the fields of
the orders into two groups. The fields above the line, they are the dimensions. And the fields below the
line, they are the measures. For example, we have
the customer ID, the order dates, order ID,
product Ed, and so on. Those fields are
dimensions in Tableau and the fields below the
line that discounts, the quantity sales and so on. Those fields are measures, you can find this splitter, this horizontal
line in each table. If you go to the
customers over here, you will see again the same
line that splits dimensions from measures and the same
if you go to the products. Scroll down, we have
again the same line. And one more thing that
you might already noticed. Let me just close those tables. That outside the table there
is as well horizontal line. Sometimes in Tableau we curate fields that
doesn't belong to any tables and Tableau can put it just outside
of the tables. It's like global fields, and for that we need
as well splitter to split the fields to
dimensions and measures. Okay, so now let's go
back to the orders. And now you might
say, you know what? We don't need this
horizontal line to identify whether the field
is dimension or measure. And now if the
field has the color of blue, then it's dimension. And if the field has the color of green, then it is measure. Well, this is exactly
where most of Tableau developers get confused. Things gets mixed up between dimensions, measures
and discrete. Continuous. To be honest, I was thinking the
same at the start until I found out that the color of the field indicates whether the field is discrete
or continuous. We're going to talk
about this concept in the next tutorial.
Don't worry about that. The color does not indicate whether the field is
dimension or measure, but the position of the field, whether it's above the
line or below the line. Let me show you
quickly something. Let's take any field over
here, the product ID. Let's just drag it a little bit. Now, table going to mark the
horizontal line with orange. And I'm going to show you, okay, anything above is dimension and anything below is measures. So Tableau shot that as well. All right, so now to
the next question. How do I change a field from dimension to measure
and vice versa? And here you have two options. Either you're going
to do it globally for the whole workbook, for all the views,
or you might do the change locally in
one individual view. So let's see how we can do that. Let's start with the first one where we're going to
do the change for the whole workbook for
all views globally. We're going to go, for example, let's take the
order ID over here. Just right click on it. And then we go
over here, Convert to Measure. Let's click on that. And as you can see, the
field order ID just jumped from above the line to below the line as a measure. Now if you want to change
it back to dimension, just radically con it and
then convert to dimension, that's it, it's really easy. Now let's see how we can
do the change locally at one view without affecting
the whole workbook. Let's take again the order ID, drag and throw it over here, and here we're going to
radically con it on the view. And then we're going
to go to the measures. We're going to convert
it to a measure. Currently it is a dimension. Let's go to the
measures and we have to select one of
those calculations. Let's take, for
example, the sum. Now as you can see, the order ID only for this view is a measure. But the order ID on the left
side for the whole workbook, it stays as dimension. That's, this is really
easy how you can convert between measures
and dimensions. Let's have an example
in Tableau in order to understand the main purpose
of measures and dimensions. Let's go to the orders on the left side over here
and the small data source. And let's take one
measure, the sales. We're just going to drag and drop it on the text over here. As you can see,
Tableau going to start immediately doing
aggregations on the measures. Now if you check the data, we have only one number. This is the total sales that
we have in our dataset. And now we are at the
top level of details where everything is aggregated
in only one number. And now we have to
add more information in order to understand
this number. In order to do that, we're
going to use dimensions. For example, let's go to
the products over here, and let's take the category. So I'm just going to drag and drop that category over here. And as you can see
now that dimension is splitting our measure
into two rows. So that means we
have now one level lower of details than
the top aggregation. And now let's take
another dimension. We're going to take
the product name. So let's just drag and drop it over here
near the category. And as you can see,
using this dimension can give us different level of details about the sales than the first
dimension, the category. What happened? We
just moved with the details one more
level beneath that. Now let's take third dimension. We're going to take now the
order ID from the order. Just drag and drop it
near the product name. Now as you can see,
this dimension can bring us to the
lowest level of details where the aggregation of the measure is exactly
the same original value. As you can see, the dimensions define the level of
details in our views. And each dimension can take us to different
levels of details. Always, if you want to go to
the top level of details, you have to remove
all dimensions and only have the measure as. See as we are removing
those dimensions, we are going to the top level of Another nice way
to show that if we go to the tree map visalization, let me just go back over
here to have one dimension. Let's go to Show Me and
then click on the tree. Now you can see our data is
split it to only two details. Now as we add dimensions, let's take again the
product name over here, drag and drop it on the label. You can see the view,
split it to more details, if we go to the lowest level, if you take the order ID, again, over here to the label, we
can see the view is split it. Furthermore, now I'm going
to tell you small secret. If you follow it, you can
generate hundreds of reports, even if you have small datasets. If you combine any measure
with any dimension, you will be creating a new
view or new reports with the title following this
pattern, measure by dimension. For example, sales by product, profit by category,
quantity by country. So if you follow this pattern, you can generate endless amounts of reports and views in Tableau. All right, so now if you
come with the dimensions and measures in our
small datasets, we have around 16 dimensions
and ten measures. So that means if you
follow this rule, you can generate around
160 views and reports. So even we have small datasets, we can generate huge amounts
of views and reports. So as you can see on
the visualizations, if we combine both of them, we're going to have
sales by order, date sales by shipping, date sales by
country, and so on. All right, so now let me
just show you how we build usually reports in Tableau
using dimensions and measures. We're going to work now
with only one measure, the sales, and we're going
to make dashboards about it. So let's stay at the
small data source and we're going to take
the sales from the orders. Let's just drag and drop
it somewhere at the rows. And now the dimension is
going to be the product name. Let's take the product
name from the products. Let's drag and
drop it over here. So that's it. Now we have to
call it sales by product. Let's just rename the
sheets over here, right? Connect and rename
Sales by product. All right, so now we're going
to create another one using the same measure,
different dimension. What we're going
to do, we're just going to go and Duplicate it. Right click on it and duplicate. We're going to have now
the Sales by Category. I'm just going to
rename it again. Let's call it Sales by Category. Now we're going to remove
the product name from here. Just drag and drop it
somewhere at the white space. And then we go again to the products and drop the
category on the columns. Now we're going to use
different vocalizations. I'm going to go to the
Show Me over here. And let's use the pie
chart. Click on that. All right, now we
have a pie chart, but I would like to
show the values. We go to the label over
here, click on it, and click on this Show
Mark labels in order to show some values that says
this is our second one. All right, so now
we're going to create the third one with
another dimension. We're going to take
the order date, but we're going to
show only the months. We're going to go over here
and duplicate it again. Just rename it, I'm going
to call it sales by month. We will go now and remove the category. Just drop it here. And then let's take
the order date, drag and drop it on the columns. We're going to switch the
visualizations to par. I'm going to click
on this over here on the parts as
you can see here. Table going to show the
years of the order date. We want to have it as a month.
We have to switch that. Just right click
on the Dimension and then over here,
just select the month. Let's do that. Let
me just close the, show me over here and then
let's add some lapoles. All right, so that's
it for this view. Let's make the last one, we're going to make Sales by Country. Let's duplicate this again, and we're going to call
it Sales by Country. Then we're going to remove
the dimension order date. And then we're going to
take the Dimension Country. Just drag and drop
it on the rows. Now since we have the country, we can change it to a map. Let's do that. We go
to the Show Me over here and then select
the map. Click on that. All right. So now we have a map showing the sales by country. All right, so now we have
those four reports or sheets we can build
now a dashboard. In order to create
a new dashboard, we're going to go to
this icon over here. Click on it. Before we start, I'm just going to
give it a name. Let's call it Sales Dashboard. All right? Okay.
Now we're going to go and drag and drop
all the sheets. We're going to start
first with the country. Let's just drop it
here in the middle. And then we're going to take the category just beneath it. Then the product beside it. Let's three size, a
little bit to the left. And then we're going
to take the last one, the Ns, and put it over here. As you can see, with just four dimensions
and one measure, we were able to make
dashboards about the sales. And just following this small
rule, sales by country, sales by category,
sales by product, and sales by month, always
measure by dimension. Now it's really easy to train, just go and pick
another measure with different dimensions and
build different dashboards. All right, so now let's have a quick summary where we're going to compare both dimensions and measures side by side in order to understand the
differences between them. Let's start with the definition. Dimensions are fields that
contains descriptive values, and measures are fields that contains quantitative
numeric values. For example, we have dimensions
like broaduct category, country and customer ID. And on the other hand,
we have measures like sales, profit and quantity. The next point is about
aggregating dimensions can aggregated as each member
of the dimension is unique. Measures, however, can be
aggregated using functions like some average
min, max, and so on. For example, you can calculate the total sales for
specific product category. Moving on to the data types. All different data types can be used as dimensions like string, date, bullion, and even numbers. Like we have learned,
the customer ID. But only the fields with the data type number can
be used as a measure. The next point is about
the role of analysis. Dimensions are typically
used for grouping, filtering, and
organizing your data. And measures, on the other hand, are used for calculations
and numeric analysis. The final point is
about the granularity. Dimensions define the level
of details of the data, and the granularity of
measures, on the other hand, determines the quantity
being measured. These are the main differences between dimensions and measures. All right, so that's all about the dimensions and measures. Next we will learn
another important concept for data visualizations, the discrete and continuous
roles in Tableau.
94. Tableau | Discrete vs Continuous: All right guys, so now
we're going to talk about discrete and continuous. Here again, once we connect
our data to Tableau, Tableau can analyze our data in order to make assumptions, map each field to either
discrete or continuous. Discrete and continuous are
metadata informations that's going to impact on what type of visualizations
that you can create, as well as how they
will look like. Now in order to understand
the concept behind them, we're going to compare both
discrete and continuous. First, we're going to
start with the definition. This concept comes from math. And they say discrete values
are always separated. Disconnected distinct values, continuous values are
exactly the opposite. It's like connected value, a serious or unbroken chain of data without
any interruptions. Let's have an example. Think of discrete as you are
counting 0-100123 and so on. So that means 0-10 we have
exactly 11 distinct values. But with the continuous values
we have like real numbers, which means 0-10 we have
infinite number of real numbers. For example, we have
1.21 0.31 0.4 and so on. So with discretes we
have distinct values. And with continuous
we have a range of infinite values
between start and end. Once I read about
the discrete and continuous and the following
analogy stick in my head. Think about the discrete
values as a legal pieces. You can take them apart
and you can work with each piece differently
and independently. You can move them around and analyze them in
different orders. And now think of continuous
as a roll of yarn. And now when you
unroll the yarn, you will not get
different pieces. You will just see
more of the yarn, so you will just
get a longer piece of the same string. All right. So discrete values
are separated, distinct values and
continuous values are unbroken chain of data
without any interruptions. All right, so now let's
move to the next point. We have the colors in Tableau. The discrete fields
are the blue pills and the continuous fields
are the green pills. So let's see in Tableau
what this means. All right, so now as usual,
the first question is, how do I know whether my fields are discrete
or continuous? Well, it's like the
dimensions and measures. We cannot check that at
the data source page, we have to switch to
the worksheet page. Let's two dots. We're
going to go over here. And now it's really easy. Now as you hover your
mouth on those fields, you will see we have
only two colors, the blue and the green. And you can see those
colors as well. On the data type icons, we have icons green
and icons blue. The fields with the blue
color, like for example, the customer ID, first name,
order date, and so on. Those fields are
discrete fields and the fields with the green
color, like discount, sales, unit price
score and so on, those fields are the
continuous fields. Here exactly comes
the confusion where a lot of tablet
developer think that the blue indicates for dimensions and the green
indicates for measures. Well, that's wrong those
colors to indicate whether it's discrete and
continuous. Now you know that. Let's start with the first one where we're going to change the role of field globally
for the whole work work. In order to do that,
we're going to go to the Data Bain on the left
side as you can see here. For example, the sales in
the orders, it's green pill. That means it's
continuous field as well. It is a measure, let's say that. We want now to switch
it to discrete field. In order to do that,
right click on the field, and here we have
convert to discrete. It's really easy, so
let's click on that. Now if you check
again the sales, we have it now as a blue pill. That means now it is
a discrete field. If you check the others, all of them are continuous measures, but only the sales is
a discrete measure. This change is done globally. If you go to another sheet, the sales going to steal
as a discrete field. Now if you want to switch
between discrete to continuous, all what you're going to
do is right click on it. And here we have again
the same option. We're going to convert
it to continuous. Once we click that, it's going to go back
to the green pill. That's it, it's really easy. We're going to learn how to
switch between discrete and continuous locally
for only one view. All right, let's build the view. We can drag and drop the
sales on the columns. Let's take a dimension. For example, the category
drag and drop it on the rose. Now we want to switch the
sales from continuous to discrete only for this
view what we're going to do, we're going to go to
the sales over here. Radically con, as you can see, the current role is continuous as table market for us here. Or you can see it
from the green pill. All what you have to do
is to select discrete. Let's go and do that.
Now the field sales is discrete for this view, as you can see, it's blue pill, but if you go to the data
pin on the left side, the sales stays as continuous with the color of green.
That's how you can. Locally for only one view. So for example,
if you go back to another worksheet
and take the sales, the Sal is going to be a
continuous measure. That's it. This is how you
can switch between discrete and continuous fields
locally for only one view. All right, now let's
move to the next point. We have filters in Tableau. The discrete field
going to create a filter with distinct values, but the continuous
field going to create a filter
with range values. All right, now let's
have an example in order to understand what I
mean with those filters. And now we're going to work
with a big data source, because we need more data in
order to understand this. Now let's switch to
the big data source. Just click on it.
And then let's take the Sales drag and
drop it over here. And then we're
going to take from the products the subcategory, drag and drop it on the rows. So now we have the sales
by the subcategory. Now if we want to go and
filter those values, we can go and put the
subcategory in the filters. And don't forget that the subcategory is
a discrete field, let's just drag and drop it on the filters and see
what can happen. Now in the new window, as
you can see over here, Tableau listed all
distinct values inside the subcategory. Now here with those
discrete values, we can make decisions
individually. We can include some stuff or remove others.
Let's just do that. I'm just doing this
randomly and click, okay. That says this is how the
filter in Tableau can react if we have a
discrete field inside it. So we have a list of
all distinct values, we can show this filter
on the right side. If we just right click
on the subcategory of over here and then
select Show Filter. Now we have it on the
right side and we can now include or
exclude values. Now let's see what
can happen if we put on the filters a
continuous field. Let's take the sales again
since it's continuous field, but instead of taking it from the left side here
from the data bin, you can take it
from the shelves by holding out and then drag
and drop on the filters. Since it's continuous field and a measure Tableau
can ask is first do we want to do the filter on all values or after we
do the calculations, let's go with the sum over here, since we have it as a sum. So I'm just going to click
on the sum and go next. This is exactly what's
going to happen if you have continuous
field as a filter, you will get a range. It has a start and end. You don't have distinct
values of all the sales. You will get a range of values and you have to define
the start and the end. Here we have different
options about the range, but we're going to stay
with the first one. Let's hit Care.
Now I want to show the filter on the right
side. Let's go over here. Right click on Shore Filter. Now on the right
side, you can see exactly the difference between discrete and continuous
fields in filters. Let me just extend it over here. You see the sales continuous
and we have a range. So we can filter like this by changing the start and
the end of the range. But with the discrete filter, we have all members
of the field and we can decide on each
value individually. We can just select and
deselect those values. All right, now let's
move to the next point. We're going to talk about
the changes in the view. Discrete fields create the
headers of the visualizations, where the continuous fields creates the axis
of visualizations. Okay, now let's see what
this means in our view. As you can see,
the subcategory is a discrete field and the sales
is continuous field view. Over here, we have three things. We have the marks, those parts. On the left side, we
have the subcategory, and we call those
informations as headers. And the third information, we have the axis of the view. What is the difference
between headers and axis? The discrete fields like subcategory always create
the header of the view. In the header over here,
you have a list of all distinct values inside our
dataset, exactly as it is. But the continuous field, like the sales, creates the
axis of the visualization. It's like the values
inside the filter. It's a range that
has starts and ends. Unlike the headers,
you cannot see in the axis all the possible
values individually, you have a range
with start and ends. And in between we have pens, so discrete fields create the headers and continuous
fields create the axis. All right, so the next
point we're going to talk about sorting data
in discrete fields. We have many options in
order to sort the data, but with the continuous fields in Tableau, it is very limited. So let's see an example. So we're going to stay
with the same example, and we can start with the
discrete field subcategory. In order to sort the data
in the discrete field, just right click on the subcategory over
here on the shelf, or you can go to the header. It's exactly the same, so right
click on the subcategory. And then we can
select over here, the Sort, select that. And now we have extra
window to set up the Sort. So as you can see here, we have many different
options like alpha patic field,
manual and so on. So let's go with the manual
over here and here again, since subcategory
is discrete fields, we're going to get a list
of all distinct values. Then we can change the order. For example, by just clicking
on the applications, we just can bring it down and we can take the storage
and bring it up, Plenders down and so on. So we can do it manually
without any rule. As you can see, as I'm
changing the values, the order in the visualization
is as well changing. If you want to sort the data, we're going to use the discrete fields in order to do that, since we have many options. Now let's check the
continuous field. I'm going to cloth this. Now if you go to the
continuous fields on the sales, right click on it. We don't have here an option to sort the data like in
the discrete fields, but instead we have
only one option. If you hover on the sales, we have this very small
icon and we can use it in order to sort the data,
ascending or descending. Just click on that.
And as you can see, now the data is sorted
by descending values. If you click on that, again, you will get the
data as ascending. Sorting the data using continuous
field is very limited. But instead of that, we can use the discrete fields
in order to sort the data since we
have many options. Okay, now let's move
to the next one. And this is really important
to understand what is really the purpose of having continuous
and discrete tableau. The main use case of using
the discrete values is to do a deep dives analysis
in specific scenario. On the other hand,
we're going to use the continuous values to see the big picture and do trend analysis. Let's have an example. Now we're going to
create a new view using the big data source, since we have more data. And we're going to go
to the table orders. Let's take the order date. Just drag and rub
it on the columns. And then we're going
to take one measure, let's say the quantity drag
and dub it on the rows. Now as you can see,
the order date is a discrete field and we
have five years of data. But now what we're
going to do, we're going to go to the order date. Right click on it and we
want to see more details. Just go to the exact
date over here. Now as you can see,
Tableau did convert it automatically from discrete
to continuous value, and we have it as a green pill, and that's because we have
a lot of order dates. And Tableau tried to bring
it all in one picture. You can see now the order
date created an axis, a range of dates having
continuous fields. You have all the data
in one big picture. And that's going to help you to find any trend in your data. Now let's go and convert the order date to
a discrete field. In order to do that, we're
going to go to the order date, right click on it and
click on Discrete. As you can see now, we just
broke the chain and we broke the visualizations
into individual dates. Now because of that, we
have the header and we have all the distinct values
inside our data. We have all the days, all the
months of the five years in one visual without having the
order date as a discrete, we cannot really do
any trend analysis over here because it's really huge visualization
after we converted the order date
from continuous to discrete, lost the big picture. And now it's really hard
to do any trend analysis. But now instead of
doing trend analysis, we can do now a deep dive, details analysis for
each individual date in order to analyze a
specific problem or scenario. Or to answer the question, why do we have in the
first place a trend? You can check the value of
each date individually. We usually use the bar
visualizations for the discrete and the line visualizations for the continuous.
Let's change that. I will go over here
on the marks and instead of automatic,
I will move it to bar. We have it now here as a bar. And I'm going to just duplicate the sheets and bring
the order date as a continuous and then change the visualizations to automatic. Now I just moved both of the
views into one dashboard in order to see the differences between continuous and discrete. As you can see with
the continuous, if you want to make
like trend analysis, seeing the big picture or
you're going to make like a report for the management without showing a
lot of details, then go and use the
continuous field. Now if you look at
the visualizations with the discrete fields, you can use that if the task
or the requirement is to do deep dive analysis under data and evaluate each
data individually. The main purpose of
having discrete is to do detailed analysis
where the purpose of continuous values is
to do trend analysis. All right, now let's have a summary where we're
going to compare both of the discrete and
continuous side by side in order to understand
the differences between them. Let's start with
the definitions, discrete values
are disconnected, separated values, and continuous
values are connected, unbroken chain of values. For example, in discrete 0-10 we have infinite
number of values. We have exactly 11 values. In continuous 1-2 we have
infinite number of values. Next one is about the colors. Discrete fields
are the blue pills and continuous fields
are the green pills. Moving is discrete
fields generate filters with a distinct list of all values available
in the dataset. On the other hand, the
continuous fields generate a range filter that has
start and end values. Next point is about the views. Discrete fields can generate the header of the view
showing all possible values, and the continuous fields
generates the axis of the view. Again, it's like a
range of values. Then we have sorting. You can use discrete fields to sort your data using
different options, but if you sort your data
using continuous fields, you're going to have
very limited options. We have only ascending
or descending. Finally, we're going to
talk about the purposes. The main of the discrete is to analyze a
specific scenario, like you are doing a
deep dive analysis in a specific issue. But the main purpose of
the continuous is to understand the big picture
from the data in order to do, for example, trend
analysis of your data. These are the main differences between discrete and
continuous fields. All right, that's all for
the discrete and continuous. Next we'll wrap things up with the summary and get
better understanding of the big picture and the differences between
all of these concepts.
95. Tableau | Data Types vs Dimension & Measure vs Discrete & Continuous: All right guys. So now what
I'm going to show you is how those different metadata
concepts like data types, dimensions and measures,
discrete and continuous, are related to each other. All right, so now we have
a field in our data and in Tableau we can assign it
to different data types. So it could be string or pull in with true
and false or a date. And we have as well date
and time or a number, whether it's whole or decimal. And now next Tableau can assign it to another
metadata info, either dimension or measure any data type that
is not a number. It's going to be dimension, string, polling, and date. All of them going to be
automatically dimension. You cannot convert
it to a measure. If the datatype is number, we could have it as a measure or dimension if it makes
sense to do aggregation. Next table can
assign this field to the third metadata concept,
discrete or continuous. If we have a dimension field
with a data type string, it could be only discrete. We cannot convert it to a
continuous like in our dataset. We have the category, the
first name, the country. All those fields are string
dimension and discrete. You cannot change it
to anything else. Goes for the data type bullion. It could be only dimension
and only discrete. But now if we have a
dimension filled with the data type date or date time as you saw
in our examples, it could be continuous
or discrete. We can have both now
to the last one. If we have a field with
the data type number, it doesn't matter whether
it's dimension or measure, we can have this field as continuous and as
well as discrete. All righty, with this
you have big picture for all those confusing concepts
in metadata in Tableau. All right everyone, we have now better understanding about the data types and roles in Tableau and these
important concepts. In the next section,
we will learn about renaming and
Elias in Tableau.
96. Tableau | Section: Tableau Renaming: How to rename things in Tableau. As we are preparing
our data sources, what we usually do with that, we're going to go and rename
stuff like renaming tables, columns, and even give
Eliass to our data. First I'm going to
introduce you to the different naming conventions that each developer should know. And after that you're going to learn the different techniques on how to rename fields
and tables in Tableau. At the, at the end, you're going to learn
the different methods on how to add Eliass to
your data in Tableau. So let's start first by learning the different naming
conventions and what are the differences between
them. So now let's go.
97. Tableau | Naming Conventions: Sometimes in real life projects, the source of your data might contain technical or
unfriendly names. And when you are
creating visualizations for the users or
your colleagues, you have to make sure
that you are using friendly names that are easy
to understand and to read. And that's why after you connect your data to Tableau
data sources, Tableau will start
cleaning up and renaming the fields and the tables
to more friendly format. And the format is following specific naming
convention that is decided from the Tableau
team, which is really great. So let's understand first
what is naming convention? Naming conventions are set
of rules and guidelines that could be used in order to give names for things like tables, fields, functions, and variables inconsistent and
understandable way. Let's say for example, we have
the two words, hello word. In order to create a
naming convention, we have to decide in two things. First, the word itself,
how we can write it. Here we have three ways we
can use the lower case, or we can decide to go
with the upper case, or we could use the
capital letters. And the second thing to decide is the separator between words, between hello and word. We have here white space. Here we have different options. You could use dots underscore, white space, or even nothing. Now for example, let's say
we're going to go with the lower case and the
separator underscore. Then we're going to have
the following name. Hello, underscore words. With that, we have a naming convention that we're going to follow through all the projects and it's really easy to follow. And at the same time, it's very important to decide on the naming convention
for your data model, especially at the
start of your project. And if you don't do
that, I promise you the look and feeling of
your visualizations and dashboards gonna
look really bad and the whole project gonna look unprofessional and inconsistent. And one more thing,
project team decides on different naming conventions so there is no really
right and wrong here. All right everyone. So now
I'm going to walk you through the most common
naming conventions used in programming languages. The first naming convention
is the snake case case, the lower case in all the words, And going to separate them
using the underscore, The name at the end is
going to look like snake. All right, Our example is
going to be the customer name. And we're going to work
with this table to fill all the different
naming conventions. An example of the output, the rules for the
litter case and the separators in
which applications and programming languages
we can find this rule where we're going
to start with the snake case. The litter case is
going to be here, lower case, the separator is
going to be the underscore. If we follow those
rules with the example, we're going to have a lower case customer underscore name. We can find those
formats in Python, HP, and Rob the Snake format
is really easy and popular and you can find
it like almost everywhere. And now we're going
to talk about the next naming convention. We have the camel case. And here we have another
naming convention that looks like an animal. In the camel case,
only the first word going to be lower case, but then all the following
words going to be capitalized. And between the words
there is nothing, no separators, no dots, underscores, dashes or anything. So at the end, we're going
to have the shape of camel. All right, so that means we have the second naming convention. We have the camel case. The rule for the letter case is going to be the following. The first word is
going to be lower and the rest of the word is
going to be capitalized. For the second rule, we have the separation. There
is no separation. There is nothing
between the words. Here, we're going to
write no separation. Now if we apply those two
rules in our example, the customer name, we're going to have the
following output. The first one going
to be everything. Lower case customer,
there is no separation. That means we're going to start immediately with
the second word, but the second word going to be capitalized, it like this. We can see the camel
case is widely used in programming
languages like Java, Java, Script, and scripts. That means we have the
third naming convention, we have the Pascal case. It's very similar
to the camel case. The rule says all the words
going to be capitalized. So here we have capitalized. And the separations,
there is no separation. Like the camel case,
there is nothing. If you follow those two
rules on the customer name, we're going to have
the following output. The first word is going to
be customer capitalized, no separation then
a capitalized name, we can find this
naming convention. The Pascal case is used in programming languages
like Java and C, Sharp. I like this naming convention. I used it in many projects. All right, the next
naming convention is going to be the cup case. I think by now the one who named those naming conventions
should be an arbitude. As you can see, we
have all the words are lower case and the skew
and separated with dashes, the name going to look like
a delicious hot Cbscow. The fourth one, we
have the keep case. And the rule going to say, okay, the letter case going to be lower caste like the snake case, and the separation
going to be here, The D. If we follow those two rules on the
customer name in our example, we have the follow output. It's really easy going
to be customer or lower then then name if you are
web developer or designer. I think you know about this
naming convention because it is widely used
in HTML and CSS. I think it's like
the snake case. It's really easy to follow. Now we have another
naming convention. This one is very important
and we call it a title case. It has nothing to do
with animals or foods. Sadly, we have here title case. The rule going to say, okay, the words going to
be capitalized, and we're going to separate
the words with a white space. So here we're going
to have space. So now if you follow those
two rules in our example, we're going to have
capitalized customer, then space, then
capitalized name like this. So why It's important
because this one is the naming convention that Tableau team did
decide to go with. So you can see this naming
convention in Tableau. Tableau currently is enforcing this naming convention
in all your data. So once you connect your
data to Tableau, Tableau, going to Clelup and rename everything following this rule. Well, if you look at it, it's really friendly
and easy to read. But sometimes in projects we are forced or we are following
some requirements, follow a specific
naming convention, it doesn't match
with the title case, then the situation
is really bad, you have to go and
rename everything again. Of course, you don't
have to follow one of those naming conventions. You can make your own
rules and guidelines. For example, let's say this is my naming convention
and the letter case, let's say it's capitalized and I would like to separate the
words with the underscore. I'm just mixing stuff around. If I apply those rules
to the customer names, we're going to have
something like this capitalized customer
underscore capitalized name. And with that we have defined
our naming convention. All right, so now let's
check the naming conventions in our datasets and
as well in Tableau. Now if you go
through the datasets that I've prepared
for this course, the small and the big one, you can see that I'm always following the same
naming convention. The letter is going
to be capitalized and going to be separated
with an underscore. So for example in the orders we have the products
underscore ID. Or if you go to the customers, you can see the first
underscore name and so on. So I'm always following the
same naming convention. All right, so now let's
check how Tableau did name our fields and
tables from the datasets. You can check those
informations either from the worksheet or in
the data source page, but in the data source page you can find more informations. So now we are at the
data source page. Let's go to the meta data grids. And here it's
really interesting, We're going to find
two field names. We have here the field name
and the remote field name. What are the differences
between them? Well, the information in the remote field names comes
from the original datasets. And as you saw, the original
dataset is following the naming convention of having underscore
between two words, and we have all the
words capitalized. We have, for example,
the order underscore ID, customer underscore
ID, and so on. All information we find under the remote field names comes
from the original dataset, from the original source system, but now the field name on
the left side over here, those informations
comes from Tableau after renaming and
cleaning up our fields. If you take a closer
look to those names, you can see they are
following the title case, where we have capitalized words and separated
by a white space. You can see over here we
have the product space ID, where the original name was
Product underscore ID here, Tableau did rename
our fields here. It's really cool. We have
in the Tata the grid, a mapping between
the old values, the remote field names
and the new ones. After Tableau did rename them, we have always a data lineage between Tableau
and our datasets. As I said, there is no
right and wrong here, but it's very important
to define those rules at the start of the projects before you start building
any visualizations. I remember one project where we started immediately
with building the dashboard and visualizations without deciding first on
the naming conventions. We build around 30 dashboards in Tableau, and after a while, we found out that the developers are using different
naming conventions, which is really normal
if you don't define the guidelines and the rules at the start of the projects, then everyone going to
make their own style. We end up having a lot of dashboards with different rules, and the users were not
happy about it at all. Then we decided in the
anemic conventions, and of course, we were
too late for that. Then we spend a lot of
time renaming the dataset, checking the reports, and so on. If you don't decide at
the start of the project, especially if you have
like a big projects on dynamic convention, then you can have really
painful and costly process of renaming everything
from scratch. Make sure at the start to
take enough time to talk to your users and the project team to decide on the
naming conventions. And very important in
the review process of any new dashboards in
Tableau that to check that the naming
conventions are followed in each workbook to be
consistent in the whole project. All right, Kay, so that was an overview of the different
naming conventions. Next we will learn how to rename fields and
tables in Tableau.
98. Tableau | Renaming: All right, so now
let's say that you decided together
with your users and the project team on specific naming convention which is different from the
one that Tableau uses. Now the question is
how to rename Tableau? In Tableau, we can do the
following changes on the table. We can rename the table itself, or we can rename the
fields inside the table. And the last one, we even can change the values
inside these fields, also known as aliases. We're going to talk about
it in the next tutorial. In this tutorial, we're
going to focus on renaming the fields and
renaming the tables. First, let's learn how to
rename the fields in Tableau. All right, so now
we're going to learn how to rename fields in Tableau. Let's have the following task. The task says,
rename our fields in Tableau following the naming
convention Pascal case. So that means all the words are capitalized and no
separation between words. All right, so now the
first question is on which page we can
rename our fields? We can rename our
fields either in the worksheet page or in
the data source page. We're going to get
the same effects. But I usually go to
the data source page since there we can
find more metadata, information about the
fields and tables. Now the second question is, can we rename our
fields globally for the whole workbook,
for all worksheets? And as well, can we do it
locally for only one view? Well, you can do both. But renaming locally
for only one view, it's a little bit tricky. So now let's learn how to
rename our fields globally, for the whole workbook, for all views in the worksheet page. Okay, so now let's go to the
worksheet page over here. Then we're going to go to the
data ban on the left side, We will rename the
shipping dates. And here we have three methods. The first one is the drop down. So what you're going to
do, write a click on it and then simply
go to the rename. So we're going to click on that and we're going to rename
it to the past cut. So I'm just going to remove the space between
them, then Enter. And that's it. It's really easy. We just renamed the
shipping dates and the second method is
to use a shortcut. For example, let's
go to the order date over here and hit F two. And with that we
can edit the name. So I'm just going to
remove as well the space between order and
date and hint enter. As you might already noticed, the position of the order date just change in the Databan. That's because the fields in the Databanes are sorted
in alphabetical order. That's what the second method using the two, using
the shortcuts. And the third method to
rename the fields in the worksheet page is
to click and hold. For example, let's go to
the Unit Price over here, Lift to click and
hold, then release. As you can see, we can
now edit the name. This is third one.
I'm just going to remove the space between
them and hit Enter. That's it. Those are
the three method of renaming the fields
in the worksheet. Drop down a shortcut using
two and click and hold. One more thing about renaming, unlike the aliases which
we get a layer later, can rename any type of fields. So whether it's dimension
measure, continuous discretes, any type, we can
rename it so there is no restriction or whatever
for renaming Tableau. All right, so now let's
go to the next one. We're going to rename the
fields in the data source page. Let's go to the data
source page over here. And here we have two places
where we can rename stuff, either at the metadata
grids or at the data grid. And here we have only two
methods to rename stuff. So the first one is going
to be the drop down, like the worksheet page. Let's go to the
name, for example, the order date, right click
on it and then rename. So we're going to remove
the space between them. And that's the second
method to rename fields in the data source
page is by double clicking. For example, let's go over
here on the metadata grids to the customer ID and just
double click on it. Now we can go and as well we're going to
remove the space. This is how we can rename. In the datasource page, we have only two methods that drop down and double click here. We don't have, sadly,
any shortcuts. All right, so now we have
the following scenario where we have renamed
the fields like several times and we forgot the original
names of the fields. In this case we reset everything back to
the original names. And we can do that either at the data source page or
at the worksheet page. Let's see how we can do it
on the data source page. If you just go to the
field, for example, the customer ID,
right click on it. Then here we have the
option reset name. Let's click on that. As you can see,
now we are back to the original name of the field. I found it really strange
because I would like as well, to have the option of resetting to the
Tablemic convention. Now let's see how
we can do that. On the worksheet
page, I'm going to switch back and then
go to the Data Bain. Let's pick the order dates. And now we're going to go
and edit the field again. So right click on
it and then rename. Then you can see over here a very small icon to
reset the original name. By clicking on it, we reset the field to the
original field name. All right, so now let's
say that you have a lot of fields and you want to
reset all of them now. Instead of resetting
them one by one, we can do multi selection
and then do reset. And we can do that at
the data source page. So let's switch there. And it doesn't matter
whether you're going to work with the meta data grid or at the data grid. So now what we're
going to do, we're going to go to the order ID, click on it and
then hold control. Select the next one, and then we're going to select
the unit price as well. Then right click
and reset names. Once you do that,
you're going to reset all of them,
which is really nice. So we have the unit price
reseted the shipping dates. The order dates. All right, so now we have the
following scenario where you are in the project
and you build already view. But afterward you
decided to do renaming. What can happen to our
view if we do renaming? For example, here in the view we have the order underscore ID, and we want to rename it
back to the Tableau name. So we're going to go
to the order ID two, then instead of underscore, I'm just going to leave
it as a white space. As you can see in the view, Tableau did change the names automatically to the new name. Well, you might say, okay,
and what this is expected, if I change the name
of the data source, it's going to change as
well in the visualizations. Well, this is only in Tableau. If you are using any
other tools like Power PI and you do
renaming a datasets, the whole visualization
going to break. So here if you have
the task of renaming, this is going to happen
fast in Tableau, but in power BI projects it's
going to be really painful. All right, so so far
we have learned how to rename the fields globally
for the whole work. Boop. Now the question is how to rename locally for
only one view. And here it depends
on the field roles, discrete and continuous. So let's start now
with the continuous. As we learned before,
the continuous can generate the
axis of the view. So here in this example,
as you can see, the quantity and sales
are the green pills. That means they
are continuous and they generated the
axis of the view. Now to rename the quantity over here and the sales,
it's really easy. What we're going to do, we
will go over here on the axis, right click on it, and
then go to Edit Axis. Let's go there. Then here
we have a new window. And if you go over here, you can see the axis titles. The current title is Quantity. Let's go to the field
over here and change it from quantity to quantities. Then let's close this.
As you can see now the field name called
quantities on the axis. And if we check the
data Bain over here, the field stays as quantity. We did this change only locally. At this view, this is really
easy for the continuous. But the tricky part is if
we have a discrete field, for example, the order ID
over here is discretes. We have the blue pills. This
one going to be tricky. Now, we're going
to change the name from order ID to orders. What we're going to do, we're going to go to
the blue pill over here at the rows and
double click on it. Double forward dashes, write
the word orders, then press. And that's it. Go outside, just click here in
the white space. And as you can see now we
have renamed it to orders. And as you'll hear in the view, but we didn't change
the global name, it stays as order ID
here at the data pain. This is how we rename the
discrete fields locally. At one view it was
not really clear, it's tricky, but let me show
you how I usually do it. Let's take another field, that category over here. We're going to change it
from category to categories. What I usually do,
I go over here and double click on it and
just I copy the name. Then I go to Antics Editor
and paste the name. Then for its we're going
to have the new line then double dashes and we're going to have the new name categories. And that's it. Then
I'm going to copy it from here and go
back to Tableau. Then again, inside the category over here, double click cones. Then I remove these parts and just paste the new
stuff. Then Enter. So that says, this is how I usually do it for
the discrete fields. I go to the text
editor and prepare there since it's more clear
from me what I'm writing. All right, so now
we have learned all different methods of renaming fields in Tableau
at the data source page, the worksheet page,
globally and locally. All right, so now we're going
to move to the next point where we can rename
the tables in Tableau. And here again, we
can do the changes either at the data source page or at the worksheet page using the same methods as
renaming fields. The next point about
locally and globally, you can change the
names only globally. So anything you do, it
can affect all the views, which is not really critical
as the field names. Now let's see how we can do
it at the worksheet page. So we're going to
stay with a small data source over here and let's minimize everything
so we see the table names. You might already noticed that
on the names we have dots. And that's because our
datasets comes from CSV files, which is not really
useful information to see it at the data source. So we can go and
clean up the name and rename it to only, for
example, customers. We can go to the name over here, right click on it and
then click rename. So I'm going to rename
it to only customers. The next one, we're going to use the second methods using
the short cut, F two. Let's hit F two,
remove the S parts, we have only the
orders and we're going to use the third Meisodes
for the products. Just click and hold, then remove the CSV
parts that those other stream Mesodes
for renaming tables at the worksheet page. Now let's do the changes for the big data source at
the data source page. Let's switch there. We're going to go to the data source page. Here you have two places
to change the table names, either at the data model
or add the metadata grid. We cannot go to the data
grid to rename tables. First, let's switch to
the big data source. I'm going to go over here,
the big data source. Let's change the orders
at the data model. Here we have only one methods, right click on it and rename. So we're going to
remove the CSV parts, and then we go to the
customers over here. Then let's go to
the metadata grid. And as you can see,
just click over here and you can remove
the CSV parts. So that's it. And now
for the last one, we have to rename the products. So we can go over here
and select the products, and then we can rename it
in the datasource page. So that's it, this is how
you rename the tables. At the datasource page, we have the data model
and the meta data grids. So with that, you have learned all the possible methods on how to rename tables in Tableau. All right guys. So with that, we have learned how to
rename things in Tableau. Next we will learn how to
add aliases in Tableau.
99. Tableau | Aliases: Let's first understand why and when we need liss in Tableau. Sometimes in Tableau projects we face the following
situations. The first one is when we have a poor data quality
in our datasets, Chrome data typo or
inconsistent values, we have somehow to clean up our data before we start
building our visualizations. For example, we have
the following scenario on the table, customers, we have bad data quality inside the field. So here
we have a typo. Sometimes it's Germany,
sometimes it's Deutschland, sometimes they call it USA, and then America,
the data quality is really bad in this staple. So here we have to
do something about it and clean up the data. And here we have two options. Either we go back to the original datasets and do
the changes of the values. And the second option,
we can do the changes directly in Tablo using aliases. How we're going
to clean this up. We're going to remove the
E from here, the typo. And then instead of Deutschland, we're going to have Germany. And instead of America,
we're going to have USA. And we might have
another situation where the data quality is good
but the names are too long. And if you're building views, you will understand
that everything is tight and you don't have enough spaces to show the whole
values of the dimensions. That's why we end up, most of the time changing the values of the dimensions to shorter
names, to abbreviations. For example, instead of
having the value of Germany, we're going to have
E instead of USA. Us here, F R E, and US here. Again, we have the
same situation. Either we're going to go back to the original dataset
and change the values, or we stay at Tableau and do it directly there using aliases. In real projects, you
cannot go each time back to the source system or to the original datasets and
change the values there. Either you don't have the time for that or you cannot do that. That's why we end up always changing those values
directly in Tableau. So eliuses in Tableau are alternate names
for the member of a discrete dimension field so that their labels appears
differently in the view. As you might notice, I say it's discrete dimension field and
that's because Tableau does not allow you to
create eliuses for measures or for
continuous dimensions. So in Tableau you can
create Elises only for the fields with the
role discrete dimension. And now as usual we
have the questions on which page we can
create eliuses. Well, only on the worksheet page we can create the
eliss in Tableau. We cannot create it in
the data source page. And the second
question, can we create aliases globally for
the whole workbook, all the views and as well
locally for only one view. The answer for that,
we can create aliases only globally That's going to
affect the whole workbook. All visualizations. We cannot create aliases
locally for only one view. Okay, we're going to go
to the worksheet page. We cannot do it at
the datasource page. We're going to stay at
the small data source. Let's take the country's drag and rob it over
here on the rows. And then let's take any measure, let's take the scores, drag
and rob it on the columns. The task here, instead
of having those values, France, Germany, USA, we
want to have short names. Here we have two methods to
create aliases in Tableau. The first one is to go to the
data bin on the left side. So let's go to the field
country over here. Right click on it, and then here we have the option aliases. So let's go there. And
here we're going to get a new window to
edit the aliases. So let's check what
we can see over here in the middle, we
have three columns. We have members, has eliases
and value of the aliases. The first one we're going to see all the members of the
dimension country. Those values comes directly
from the datasets. So those are the original
values from the source. Then the next one
we has has aliases. It is like an indicator to
show us whether the values in the view are going to come from the original values
or from the Elias. Now it's all empty because
we didn't add any aliases. And the third field, we
have the aliases here. We can go and edit the aliases of each
member individually. And as you can see
now, the aliases are exactly identical to
the original values. That's why we don't
have any aliases. Now let's go and change that. Instead of France,
we're going to have R, And then instead of Germany,
we're going to have E. As you can see, as I'm adding different values in the aliases
from the original values. Tablo going to market as a star. Now let's go for the last one and we're going
to have it as US. Now Just check what's going
to happen once I click Ok. You see here we have the old
values and if I click Ok, switches to the aliases, this is how you can add
aliases in the data Bain. But now let's say
that you change your mind later and
you don't want to use the aliases and
instead of that you want to go back to
the original values. How we can do that. Maybe
you already saw it. So let's go back to
the country over here on the data
Bain, right click. We go again to the aliases and
while editing the aliases, there is here an option
called clear aliases. What you can do, you can go
over here and just click on it and everything going to
reset to the original values. And as you can see, those
indicators did vanish. That means there is no eliass. Now if you go and hit okay, the value is going to go back to the original values
from the datasets. Here what I usually do once
I need aliases in Tableau I don't go directly to one
field and change the values. But instead of that, I tend always to create a
new duplicates of the field and only change the values of the new
fields that I have created. So let me show you what I mean. We go to the country,
the right click, and then we go to the option
over here, doublates. Let's do that. And as
you can see now we have another field called
Country with the copy. And of course now from the
name I can understand this is copy and the other
one is the original. But in Tableau, if you look very closely to the data type icon, you can see that
in the doublkates we have like an equal sign. This sign indicates that this
field is not original one, but it is created from
another original field. If you see, that means this is a customized field
that we have created. What I usually do,
I go and rename it, we're going to call
it country shorts. Now I create the aliases on this new field,
let's go and do that, Right click aliases, and
then instead of France, F, R, D, E, and US. So with that I have the
two options, the long one, the original one, and as well the short
version of the country. And I can decide the Is
visualizations whether I'm going to use the short
version or the long version. All right, that's all for
the first method where we created aliases from the
left side, from the databan. Now we're going to go to
the second method where you can create aliases
directly from the view. Let's see how we can do that. Just move over the value France over here and right click on it. And then here we
have the option edit Elias. Let's select that. Now here I have
very simple window. I just have to edit
the lis only France, so I'm giving the Elias
only for one value. Let's do that FR and then hit Ok. And as you
can see in the view now, we just change the
value France to FR quickly from the visualization and we can do the
same for Germany. So right click on the
value, then edit Elias. Again, the same window,
we go see DE and Ok, as will the value change
directly in the view. This is really quick methods to edit the aliases
directly in the view. Now if we go and check the dimension country
in the Databain, let's check the Elias. As you can see,
the member France and Germany has an Elias, FR and DE and we've done
that directly from the view. Now the question,
which methods you use, I would say if you want to
change multiple values, go to the databain
and do the changes. It's just easier to work with the window and add
all those values. But if you want to
change a single value from the dimension, then you can do it quickly by going to the view
and edit the alias. And that's all for the aliases. This is really great
way how to clean up how to change the
values directly in Tableau without having
you going back to the original datasets
doing the changes there. All right, so now we have the following Tableau
task for you. The task says, abbreviate the values inside the
field category in the table products
from the big datasets showing only the first
character from each value. You can pause the video
right now to do the task, then resume it
once you are done. All right, now let's
do that quickly. As I showed you before, first we start with
duplicating the field. So I'm going to go and do that. Then I'm going to rename
it to category shorts. Then I'm going to present
posts of the values, category and category shorts. So far both of the dimensions has exactly the same values. We didn't change anything. Now we're going to
go to the category short, write a click on it. And then we're going
to go to the lius. The task says, the
first character, the first letter
from each value, so that means the first one
going to be the second one. It could be or OS, so I'm going to leave it as O. And the third one is going
to be, then click Okay. And that says now we
have new dimensions that has only the first
character of each value. And we have done
that using the lius. This is really easy.
All right guys. So with that, we have
completed this section, which is really important
step in order to prepare our datasets before we start
building our visualizations. In the next section,
we will learn how to organize and structure
our data in Tableau.
100. Tableau | Section: Organizing Your Data: How to organize your
data in Tableau. In Tableau, we have different
techniques and methods on how to group up and
organize your data, which is very important
for your users to understand your data. First, you can learn
how to organize the dimensions in hierarchies, and after that, you can
learn how to group up the members of
dimensions using groups. Moving on, we can learn
how to cluster your data into different groups
using the cluster group. And after that, you
can learn how to split your data into two
subsets using sets. Then we have another
method called Pens, in order to group
up the values of the measures in order
to build histograms. Let's start with the first
method of organizing our data using
hierarchies. Now let's go.
101. Tableau | Hierarchy: All right guys, the best way to understand the hierarchy
is to have an example. If you take a look at our data, for example, the customers, you can find some
dimensions are related to each other's since they
hold similar informations. For example, the
dimension country, we have values like
Germany, USA, and France. And we have another
dimension city, where you can find the cities
inside those countries. For Germany, we have
Berlin, Stuttgart. And then we have a third
dimension, Postal Code, where you can find the
codes inside those cities. As you can see, these three
dimensions are describing common information. They give us information about
the user location, and we can relate
those dimensions together using the hierarchy. In hierarchies, we
have different levels. And we start with the top node, and we call it the root node. This node represents
the highest level of aggregations
in our hierarchy. And now we're going to go to the next level of the hierarchy, where we have the country. In this level we're going to see more details about our data. Where we have, for
example, the two values, USA and Germany, and the
links between the nodes, we call it branches. And now we're going
to go to the next level in our hierarchy. We have the level two
here in the city. We will see more
details about our data. So in USA we have
Portland and Seattle. And in Germany we have
Stuttgart and Berlin. And again, we have
the link between the parent node and the child
node using the branches. And now we're going
to go to the last level in the hierarchy, we have the postal code. And here we're going
to split the structure furthermore with more details. So we have the following
bustal codes for each cities. Now, since the postal
code is the last level in our hierarchy and those value
don't have any children, we call those nodes
as the leaf nodes. The leaf nodes or the leaves, they represents the
most detailed level of our data in this hierarchy. So now with that, we have the complete
structure of our hierarchy. As you can see, it looks
like a tree structure. The top node, we call
it the root node, it represents the highest
level of the details. Then we have the
intermediate levels, and they are connected
using branches. And the last level, we
call it leaf nodes, where it represents the
lowest level of details. We have the root node, it represents the highest
level of the aggregations. Then we have intermediate levels connected with the branches. And then we have the
leaves, the leaf nodes. They represent the lowest
level of details in our data. As we learned before, we can do many lab operations on the cube. So if we have rake in our data, we can do two very
important operations, the drill down and the drill up. The drill down and drill up, they are all operations that's going to help
us to navigate through the hierarchy in order to gain deeper or higher level
understanding of the data. So let's understand first
how the drill down works. Let's say that we are working
with the Mejor sales. We start on the top node
on the highest level. At the highest level,
we're going to have the total sales in
the whole datasets. For example, it's
going to be 140. So now we are at the highest
level, at the root node. And if you use drill down, you're going to jump to the next lower level
in the hierarchy. So that means at this
level we're going to see more details
about the sales. So for USA we have 90, and for Germany we have 50. And now if you want to see
more details about your data, we can apply again, drill down in order to jump to the next lower level in the structure. So
what's going to happen? We're going to go to the level
two and here the sale is going to split between
Portland and Seattle. We have 40.50 and for Germany, we're going to have 24
suit guards and 34 Berlin. So that means we are seeing
more details about our sales. And now if you want to go to the lowest level to the leaves, we're going to drill down
from the city to postal code. So it's going to look like this. The Portland gonna split
between those two postal codes. Say Seattle going to be the same because we have
only one child. The same for Stuttgart,
it's going to stay 20, and Berlin, we have
two postal codes, so it's gonna split again. So as you can see we are using drill down to navigate through the hierarchy by taking us from higher level to lower
level of details. It's like we are
expanding the tree to see more details to
understand our data. All right, so now we're
going to talk about the second Alp
operation, the drill up. It's exactly the
opposite of drill down. Drill up gonna take us
from bottom to top, lower to higher level of
details how it works. Let's say we're going
to start at the leaves and we're going to have
the sales of those leaves. And now we can use a drill up to move from the postal
code to the city. For example, we're
going to have the total sales in Berlin, 30, because it's the
sum of ten plus 20. And then in Utgard going
to stay the same, 20, Seattle 50, and
Portland as well, going to sum up the
values from the leaves. So we're going to
have the value of 40. As you can see, as we
are moving higher, the value is going to
get more aggregated. Let's see that we want
to jump to the country, so we can use again, a drill up to move from
the city to the countries. Germany, we can have
the total sales of 50. For USA, we can have
the total sales of 90. Now you can use, again,
drill up to go to the root node where you can have the highest level
of aggregations. So we can have the value of 140, the total sales
inside our dataset. As you can see, if we have
a hierarchy structure, we can use a drill up and drill down to navigate through
the hierarchy structure. Hierarchies organize and
structure the member of the dimensions into a
logical tree structure by grouping similar
dimensions together, Hierarchies are really important and give dynamics to your views where you can have
the big picture and understand the data
at the highest level. And you can drill down to specific details to gain
deeper knowledge data. All right, so now we
are back to Tableau. Let's understand how we can create hierarchies in Tableau. We can create hierarchies
only on the worksheet page. We cannot create it at
the data source page. In the worksheet
page, we can create hierarchy on the data pain page. If you take a look to
the customers tables, you can find that we
already have a hierarchy. And here we have small icon that indicates we have hierarchy, the hierarchy name
called Country City, and on the left side over
here we have small arrow. If we click on it, the
hierarchy can expand and we can see the dimensions
inside this hierarchy. Speaking about dimensions, hierarchies could be used,
only four dimensions. You cannot create a
hierarchy from measures. And this hierarchy that
we have over here, it is created automatically
from Tableau. Since Tableau analyzed the
content of the country and the city and automatically understood that there is
a hierarchy between them. But since we want to learn
how to create a hierarchy, we're going to go and remove it and create a new one
from the scratch. Now in order to
remove a hierarchy, you go to the hierarchy name over here, right click on it. And then here we have the
option remove hierarchy. Here you have to understand
that the dimensions inside the hierarchies
will not be deleted, only the hierarchy
itself will be deleted. So you will not lose any
fields on the logical tree. The logical hierarchy
will be removed. All right, so now
let's see how we can create hierarchy in Tableau. And we're going to create
the location hierarchy. We're going to go to the
left side of the data in and we're going to select
one of the dimensions. It doesn't matter which one
you're going to select, but I prefer to start with the highest level
of the hierarchy. Here in our example,
it's going to be the country select
the country radical. Click on it. And then here we have something called hierarchy. And we're going to
select Create Hierarchy. Let's go there. We have to give it a
name, so we're going to call it location hierarchy. Then he, as you can see now on the left side we have the
icon of the hierarchy. Inside it, we have only one
dimension, the country. Now in our hierarchy, we have as well the city
and the postal code. So how we can add it
to this hierarchy? As we learn, the hierarchy
has different levels, and the order of those
levels are really important. We have country, city,
and postal code. Now, in order to add the city, we're just going to
drag and drop the city beneath the country over
here and release it. With that, we have now the
city inside our hierarchy. Let's grab as well
the postal code. So we have to drag and drop it beneath the city. Let's release. With that, we have created the location hierarchy
with the three dimensions, country, city, and postal code. Here Again, if you want to hide the details about
this hierarchy, we can collapse it over here. Or if you want to
see the details, we can expand the hierarchy. All right, so this is
one way on how to create hierarchy in Tableau
by using drop down. The second way on how
to create hierarchy, we can quickly drag and
drop dimensions together. So for example, if we go
to the product table, we have as well a hierarchy
here between the category, product name, and subcategory. Our hierarchy starts
with the category, then the subcategory,
and the last one, the leaves, going to
be the product name. Now let's see how we can
create the hierarchy using quickly drag and drop. We're going to take one
of those dimensions, let's say we're going to
start with the category, drag and drop it inside
the subcategory. So I'm now hovering and selecting the subcategory.
Let's release. Once we do that,
Tableau understand that we want to connect
those dimensions. So Tableau going to
create a new hierarchy. We're going to call it
the Product Hierarchy. And let's it, okay.
And now let's see. On the left side we
have a new hierarchy called product hierarchy
with the icon. And we have insided
two dimensions, category and subcategory. We are missing the
third dimension. Let's take the product name
and drop it in the hierarchy. Now we have problem with that. The order of the dimensions inside our hierarchy is wrong, because the dimension
category should be the level one and the subcategory
should be the level two. How we can fix that? Just select the category and drag and drop it on top of
the subcategory. Let's release that.
That says this is how you change the order
of the categories. And with that, we have
the product hierarchy. All right, now let's say that we want not to remove
the whole hierarchy, we just want to
remove one member, one dimension from
the hierarchy. In order to do that, let's say we want to remove
the product name. Select it and just drag and drop it somewhere here
in the empty space. And with that, the
product name is not anymore member of the hierarchy. So this is how we can remove
dimensions from hierarchy. But I want to put them back in our hierarchy because
we need it later. So I will put the subcategory
beneath the category, and we take the
product name and put it beneath the subcategory,
and that's it. So these are the two methods of creating hierarchies in Tableau, either by drop dominu
or by quickly drag and drop the dimensions together in order to create a hierarchy. It's really easy. All right, so now we have this
hierarchy, the structure, how we're going to use it inside our view, it's really easy. We're going to go and
select the whole hierarchy, then drag and drop
it to the View. So here the hierarchy
going to start from the level one
for the countries, and we're going to see the
values of the country. Now let's have one
of those measures. We're going to
take the sales and drag and drop it on the columns. So now if you look closely
to the country, to the plu, pile over here, you can see that we have a new sign,
the blast sign. This sign indicates that we can drill down in
this dimension. So now let's go and
click on the blast sign. As you can see, now we
are drilling down in our hierarchy to a lower level. Now we are seeing more
details about the sales. And we are now at the level of the city to the next level. Now as you can see, we
have the dimension city. Our rows, we didn't drag and drop it from
the database and put it at the rows it
expanded from the hierarchy. Again, here the city
has the plus sign that indicates we can drill
down inside the city. Let's drill down again. As you can see now we
are at the postal code and we can see more
details about the sales. Now if you check
the postal code, there is no plus sign, like
the city and the country. Because we are at the leaves, we are at the lowest level
of details in our data. With that, we have
navigated through our hierarchy from the
top node to the leaves. As you can see, it's really
easy and very dynamic. Now let's say that we are at the leaves and we
want to drill up back to the highest level of the aggregations
to the top node. It's really easy if you
check again the city and the countries we
don't have anymore, the plus sign we
have the minus sign. The minus sign indicate that we can drill up
in the hierarchy. So let's see what can happen if you click on the minus sign. As you can see, we drill
up now from the leaves, from the postal code
back to the city. And the values of those cells
are now more aggregated. And now the same
thing, if you want to drill up from the city
back to the country, we're going to click on the
minus sign. So let's do that. And with that we are
moved to the level one, to the highest aggregation
in our hierarchy. All right, so so far
what we have done is we drill up and drill down in our hierarchy using
the row shelves and you know that's the
rows and the columns. We use it as developers
build our view. Now the question is
how our users and the audience get and drill up and drill down
through the hierarchy. Because the hierarchy should
be as well used quickly from the users to drill
down to the details. Now let's see how
we can do that. If we go to the view over here
and hover on the country, we can see again a plus sign. Let's go and click on that. And as you can see,
we drill down in our hierarchy from the
country to the city. Now let's go more in details and drill down to the postal code. We can hover on the city,
and as you can see, we have again the plus
sign. Click on that. And with that, we drill
down to the postal code. This is exactly how the users
can drill down in the view. Now if we want to
drill up back to the higher level,
we can do the same. We can see the minus
sign over here. Click on it and you
go back to the city. And then we go to
the country as well. We have the minus,
we click on that. And with that, we drill
up back to the country. As you can see with those icons, we can navigate
through our hierarchy. Now you might say all your
users, you know what, this is really small icon
and my users don't like it. Is there any other way to drill up and drill
down in the view? Well, yes, if you go to any of those values over here
and write a click on it, you can see in this drop down, we have a drill down. If you click on that, we drill
down to the city the same. If you select any value,
doesn't matter which one, let's go over here and
then drill down again. And with that we are
at the postal code. If you want to drill up, you can do the same, any
values radically cone it. And here we have the
drill up socilic. And to drill up back
to the country, go to any values in the country radically
on it and drill up. So those are the
two ways on how to drill down and drill
up in the view. All right guys, so
far we have created our own hierarchies by putting those dimensions together
in different levels. But in Tableau we have as well indirect
embedded hierarchies in the data type
date in Tableau. Any field with the
data type date has the following hierarchy. It starts with the highest
level with the year, then we have the
quarter the month, and then the lowest
level, the leaves. We have the days.
Those four levels are the default levels inside each field with the data
type date in our dataset. Now we have another data
type that holds as well, an embedded indirect hierarchy. We have the fields with
the date and time. Here we have informations
about the time, and we have seven levels. It start exactly like the date, so the highest level is
going to be the year, then the quarter month,
and then the day. But now we can drill down to more details since we have
the time information. The next level is
going to be the hours. Then we have minutes
and seconds. Second are the lowest
level of details. They are our leaves here. We have civil levels
of the hierarchy. Date and date and time. They have hierarchy
embedded inside it. Now let's uncover those
hierarchies in Tableau. All right, so now
we're going to go to the table orders. And
here we have two dates. Doesn't matter which
one, both of them are going to have exactly
the same hierarchy. Let's take the order date, drag and drop it
here on the rose. Now, as you can see, we
have now the plus sign. It indicates there
is a hierarchy. And it starts at the highest
level with the years. Now let's take a measure
to see some data. We're going to take
the order counts and put it in the columns. And I want to show
Israel the labels. Let's show some labels. All right, Now let's go and discover the hierarchy
inside the date. As you can see on the left side, we don't see any information
about the hierarchy, so that means it's really
embedded inside this data type. So let's go on the years and click on the plus
sign to drill down. As you can see the
next information we have the quarter
informations. So now we see the total number
of orders by the quarter. So now we can see more details
about the total counts, and then we can drill
down to the day. And now we are at the
lowest level at the day. We cannot drill down
further, for example, hours, minutes and seconds, because the order date has
the data type date. As you can see, the dimension
order date has four levels, years, quarter, month and day. It's really nice to
have it like this in Tableau because it's
really standards. I worked with other BI tools and there we have to
build it in our own, which is really time consuming to build all those hierarchies. Especially if you
have a big dataset here in Tableau,
our life is easier. Tableau did decide to have a
hierarchy inside each date. All right guys, one more
thing about the arches. They really organize
and structure your views and make it more
dynamic for the users. For example, requirements
to make sales by country, sales by city, sales
by postal code, and you don't use hierarchies, you will end up making three views like here
on the left side, it takes a lot of space. And as well, it's
literally dynamic. But better than
that, we can create hierarchy between
those dimensions. And we can put
everything in one view. And then you give
the options for the end users to drill
down and drill up, depending on what they need. If they want the
sales by country, we have it already
at the top node. But if they want
the sales by city, all what they have to do is to drill down to the next level, and we have it already,
sales by city. If someone's need to go more in detail to go to the postal code, they can drill down as well
to the sales by postal code. As you can see, it gives really your view more dynamic
and going to be more attractive for the end users if you compare to
the lift sides. Now we have more dynamic, more interactive
for the end users. And as well, you are creating list views in your dashboards. So this is really great. If you want to drill up
back to the country, we can just click
the minus sign. Hierarchies gives more
dynamic its structure and organize your
data in the views. All right, now let's summarize. Hierarchies, organize and
structure the members of the dimensions into
logical tree structure. Hierarchies are special
feature only for dimensions. You cannot create
hierarchies between measures we can and drill up to navigate through our
hierarchy to gain deeper or higher level
understanding of your data. Overall, hierarchies are really important to organize and
structure your data interviews. And it provides for the
users a powerful tool to quickly and easily navigate
and explore your data, uncover insights, and
make better decisions. All right, so that's all
for hierarchies in Tableau. Next we will learn how
to group the members of dimensions into
hierarchategories using groups.
102. Tableau | Groups: All right, Kay, So far we
have learned how to group up the dimensions
together in hierarchies, but now we will learn how
to group up the values, the members of the dimension
into groups in Tableau. We have three methods
in order to do that. So we have the groups, cluster groups, and sets. And now we will start
with the first one, how to group up the members of the dimensions using groups. But now, as usual, let's
understand first the concept behind it and then
we're going to learn how to build it in
Tableau. So let's go. All right, so now if you
take a look at our data, sometimes you're going to find dimensions that could be used to categorize or to group up
the data inside the table. For example, if you take a
look at our products data, you can find that the category can be used to
group up the data. For example, you can
see two products are assigned to the category Monitor and three products are assigned to the accessories. So this field could be
used to group up the data. Now if you check the
customer's data, you can find some dimensions that could be used to
group up the data. For example, the country, the city, the postal code. Those information can be used
to group up the customers. All those dimensions could be
used to group up our data. Those groups or those
dimensions comes directly from the datasets and we didn't
create so far anything. Sometimes we might be in a situation where
we want to group up the data differently than the original groups
in the datasets. Here we have two options. Either we go back to
the original datasets and do the changes there. I create a group, or
we can create a group directly in Tableau without going back to the
original datasets. For example, we want to
create a new group in the products and it's going
to be the product class. Here we have another group
and we're going to call, let's say for example, the first three are the class A, the last two are the class. We can create this
extra group directly. Tableau. The same thing
goes for the customers. We want to add a new group. We want to add the
continent on formations. We can add this group. For Germany, it's
going to be Europe. For USA going to
be North America. And for the rest France, Germany, USA it's
going to be as well. Europe's. All what you are doing now is adding new groups to our data. The groups Tableau combine
similar related values into higher level
categories which can create a new dimension
for your data analysis. Now let's see how we can
create groups in Tableau. And there is two methods
in order to do that. Either by creating the groups in the data in or
directly in the view. We're going to start
with the first one, where we're going to create the continent group in the data. In, in order to do that, we're going to go to
the table customers and based on the values
from the country, we're going to create
the new group here. It's important to
understand that we can create groups only on
top of dimensions. We cannot create groups
on the measures. Another feature where
we can use it to group up the measures
and we call it pens. But now for the groups, we can create only on
top of the dimensions. And the new field going to
be as well a dimension. Let's see how we can do that. Select the Country,
right, click on it. And then let's go to the Create. And here we have the Option
group. Let's select that. So now we're going
to get a new window in order to create the group. We're going to start first
by renaming the field name, we're going to call
this continent. Then in the middle of over here, Tableau going to list for you the distinct values
inside the country, all possible values
from the dataset. What we're going to
do, we're going to group up France, Germany, and Italy to Europe, and USA to North America.
How we're going to do that? We're going to multi
select those values by clicking control. France, Germany and Italy.
They are one group. In order to group them together, we're going to select
over here, the group, Once we select it, Tableau, going to put all those values
underneath a new group. We're going to give it
the name of Europe. Let's click Okay. And with that, we have created now a new
group for those three values. As you can see,
we can expand and collapse of those values
to see the details. But still we have one
more value inside the country that is not
mapped yet to a group here. What we're going to do, we're going to select
it and then click on the Group and we're going
to call it North America. That's now inside the continent, we have two values, Europe, North America, and
they are related to those members from
the country dimension. Now let's say that you
want to move one of those members from one
group to another group. How we can do that? It's really easy by just drag and drop. Let's take, for example, Germany drag and drop it here
in the North America. You will see this member
now is belongs to the group of North
America which is wrong. So I'm going to put it back that says this is how you
switch between groups. Here we have Tablo.
Another option is to remove the member
from all groups. In order to do
that, let's select Germany and click
over here and Group. Once we do that, you will see that the Germany
value is not assigned to any of those groups if I collapse those stuff. You will see that Germany
is standalone value. We usually use the group
other for all values. Thus we couldn't assign to
any of our groups here. Tableau gives us a quick way in order to create this group. All what we have to do
is to click the value of Germany and then click
over here, Include Other, Let's put that as
you can see now the value of Germany is
inside the group Other, and with that we have in
the continent three groups. Europe, North
America, and other. Now if you want to
rename the groups, you can click on the group and then click over here, Rename. So we're going to have
it like other continent or something, or. Right click on the group and then rename. That's really easy. So now what we want to do is to move Germany back to Europe. Now as you can see, the
group other did disappear because it doesn't have any
member. So that's it for now. We have created our groups. Let's click Ok. Now as you
can see on the left side, we have a new field
called continent. And it is discrete dimension and it has a special icon and the data type indicate that this field is a
group in Tableau. If you are creating
a group based on another field with
the geographic role, Tableau going to show both of the icons group and
geographic role. Because usually the group has the following icon
for the situation. It's going to show
both of the icons, geographic role and the group. All right, so now let's build the view based on
this new dimension. We're going to
take the continent dragon rabbit on the roads. As you can see it
has two values. We're going to take
the sales as well. And the columns now to see
more details in the view, we're going to take
another dimension, or we're going to take the whole hierarchy
of the location. Let's drag and drop
it here on the rose. Now as you can
see, the continent is now grouping our data. Europe for those three values, North America for USA. As we learned in
the hierarchies, we can drill down to the next
values. And you know what? This new dimension,
the continent, has similar informations
to the country and city, and it belongs to the hierarchy. Now it makes sense to add it to the structure of our
location hierarchy. So what we're going to
do, we're going to drag the continent and drop it
on top of that country. With that, the
continent going to be the level one and country
going to be the level two. We can use this new group as the highest level of
aggregation in our structure. We can drill up back
to the continent. As you can see, we can create new groups directly in Tableau without going back to the original datasets and
do modification there. All right, so that's why
the first method on how to create groups in Tableau
from the data Bain, The second method is to create groups
directly in the view. Let's see how we can do
that. We're going to create a new worksheet and we're
going to take two measures. We're going to take the profits, let's put it here on the rows. And we're going to take
as well the sales. And now we want to show all
the customers as data points. In order to do that, we're going to go to the customer ID, drag and drop it, put it here on the marks, on the details. Now we have for each customer in our dataset as a data point. Now our task is we want to group up the customers
performance. If you decide to go to the
data paint in order to create those groups
and radically connect, then we go to the groups, you will see a long
list of all customers. And now creating groups based on those values can be really painful because the
customer ID has high cardinality
compared to the country. Instead of doing that here, we will do it
directly in the view. In order to do that,
we will go and select, for example, those customers,
those data points. And we will get a new window. As you can see, Tableau tells
there is eight items that are selected and we have
the icon of the group. If we click on that, Tableau going to be create new stuff. If you look to the data pain
over here on the left side, you can see that
Tableau did already create a group with
the selected items. And it did as well the coloring. So you can see the
group as well. Here on the colors on the right side, we
have the legends. So you can see the selected item is the blue and the
others are gray. Now what we have to do is
to go and rename stuff. First of all, I'm going
to rename this group. I'm going to call
it Customer Group. As you can see, the group name is like the list of all members. It says, okay, 9113035 and more. That's because it's hard for Tableau to understand why did we select those customers and
what is the group name? In order to rename the group, we're going to go to the
left side to the Data Bain, right click on it and then
we go to Edit. Select that. Now as you can see over here, we have our group that we just selected with
the eight members. So let's go to the group
name, right click on it, rename, and we're going to
call it high performers. That those customers has the highest performance compared
to all other customers. So as you can see,
Tableau did put all the other customers
under the group other. Let's click okay now. And now we have a better
name on the right side. And it makes sense to have
a gray color for other. All right, so now we're
going to go and create another group of customers
with a low performance. All right, in order to do that we're going
to do the same, we're going to go
in the view and select those customers
with a bad performance. And once we do that,
we're going to get this new window saying, okay, nine items, and we're
going to select the group. But instead of that, if
you move your mouse away, you will see the
window disappears. In this case, we're
going to go to one of those data points and
right click on it. And then here we have
the option of group, select that. Now
what can happen? Tableau will not create a
new group on the data bin, it's going to include it as a new group inside the
already existing group. You can see here on
the right side we have a new group with the
color of orange. And with that, we have added
a new group to the customer. In order to rename it,
we're going to go to the data bin and edit the group. Let's go there now. Instead of having the
list of the members, we're going to click on it, rename, and we're going to
call it law performers. Click Okay. And now with that we have nice naming for the groups, we can as well change
the colors of the group. For example, for the low
performance, we can have red. For the high performance,
we can have green. In order to do that, we're
going to go to the Marks over here to the
colors. Click on that. Then we're going to
select Edit Colors as we say it for the
high performance. So let's select this value
and assign it to green. And we want for the low
performance to have a red and the color of the
other going to be gray. Since it's not our
focus, let's click Okay. And as you can see now the
data points has new colors. And another use
case for the groups that we use it as
well as a filter. So we give the users the possibility to
interact with our views, to focus in specific group. Now in order to do that, we're going to go to our database, to the group right click
on it and show filter. Now we have the
group as a filter. And the users can click
between the groups to change their focus on which
cluster they can analyze. For example, if they
are not interested with all those great stuff
and they want to compare the high performance
with the low performance to understand the difference
behavior between them, they can just remove
it like this. All right, so this is
how you can create groups in Tableau
using the two methods, either from the data Bain, especially if you
have a dimension with low cardinality
like the country. But if you have a dimension
with high cardinality, the customer ID, order ID, then you can create groups
directly from the view which is really fast way to assign the values to specific groups. As you can see this
feature in Tableau, the groups is really
awesome way on how to group B data directly in Tableau without going back to the original datasets and
create the group there. All right, so now you have
the following task for you. Go to the small datasets
and create a new group called classes based on the
Dimension product name. The first three
products belong to the class A and the last two products belongs
to the class. You can pass the video
right now to do the task, then resume it
once you are done. All right, so now let's
quickly create this group. We're going to check first the cardinality of the product name. I'm just going to drag and
drop it here in the rows. And as you can see, we
have only five values. That means it has
low cardinality. And we can do it directly
in the Data Bain, right click on the product name. And then we're going to
go to the Create group. And now we're going to call it, we're going to go
and call it classes. The first three members
are the class and the last two members are the
class B that says let's go. Okay, now we can go
and check the values. Let's drag and drop it over
here before the product name. And as you can see,
the three products are Class A and the two
products here are class. This is really easy. All right, so now let's summarize
groups in Tableau, combine related similar values into high level categories. And groups can be created
based only on dimensions. We cannot create groups
for measures and the group itself going to
be a discrete dimension. Groups in Tableau
are very useful to simplify your view
and make it easier to understand your data by grouping the data points into clear
and relevant categories. All right guys, so that's all
for the groups in Tableau. Next we will learn a
very similar feature called the cluster groups. We can use it in order to cluster your data into
different groups.
103. Tableau | Cluster Groups: All right everyone, So
now we're going to learn another method on how to
group up the members, the values of
dimensions into groups. And this time we're going to use the cluster groups in Tableau. But as usual first let's understand the
concept behind it, that we can learn how to build it in Tableau.
So let's go. All right, so cluster group is another way of
grouping your data, used for data clustering, which is a statistical technique to group up similar
data points together. In data clustering, we have different algorithm to
calculate the clusters. For example, we have
the algorithm Manes and another algorithm called hierarchical
clustering and another one called
density based clustering. And Tableau did decide
to go with the mine algorithm since it's really
simple and easy to implement. The mine algorithm is widely
used in data clustering. Now let me show you how the
Kemanes algorithm works. Let's say that in our dataset, we have the following
data points. First, we have to define how many clusters
we want to build. In this example, we're
going to go with three clusters, and after that, the algorithm is going
pick three points, and we call them centroids. Then it can assign
the data points the nearest centroid
for this data point, it's going to belong
to the green cluster. And then it's going to go
to the next data point and calculate the link between
it and the three centroids. And then it can assign it
to the nearest centroid. For this, it's going
to be the red cluster. The algorithm is
going to do that for all data points and assign
them to the nearest centroid. At the end, we're going
to have three clusters, the green, red, and blue. As you can see, the key means is really simple and
easy to implement. All right, so now in order
to understand the clusters, let's have the following task. The task says to identify high value customers by clustering them based
on their sales. And in order to find out which customers generate the most
revenue and which do not. All right, now in order to
create the cluster group, we have to be at
the worksheet page. And this time we can create the clusters from
the analytics pane, and we cannot do it
at the data pane. Now let's see how we can create the clusters and we will stay
with the big data source. Since we need a lot
of data points here. We need two measures.
We need the profit. So let's track and
drop it on the rows. And we're going to take the
sales as well to the columns. And with that, we have two axes, the sales and profit. But what we are missing now in the middle is the
customer's data. Each customer is going
to be one point. For that, we're going to
take the customer ID, we're going to drag
and drop it over here on the details
on the marks. All right, so now we
have the data points and each point represents
one customer. Now in order to
create the cluster, we're going to switch
to the analytics pane. So let's go over there, and if you go to the models, you will find the cluster.
It's really easy. We just drag and drop it
here on the name clusters, and here we will have a very
simple window here it says the variables for the clusters
are the seals and profits. And then we have the
number of clusters here. As a default, it's
going to be automatic. That means Tableau going to
figure out from the data, how many clusters
do we need here? As a default, we have automatic. That means Tableau
going to figure out how many clusters it makes sense to create from
those data points. As you can see, Tableau did
already created the cluster, and it's created three clusters. But if you say, you
know what, we want four clusters or five clusters, you can go over here and define how many
clusters do you need. If you have five, let me just move it over here to
see what is going on. We have now five clusters. If you want to
have two clusters, we will have only two
colors and so on. So I'm going to stay
with the three clusters. It makes sense. That's it. In this window, there is
no okay or something. So we just going to
close it because Tableau can create the
cluster immediately. All right, so now we
have the cluster. The question is, where do
I find the cluster group? Well, if you go to the
data in on the left side, you will not find any
cluster group over here because we have this information
now only on the colors. This field here is our cluster. Now, we might have
this information, this cluster group
in the data in, in order to use it
in different views. So what we're going
to do, we can just drag and drop it
somewhere in the data in. Now over here we can see
we have new fields and the icon indicates that this
field is a cluster group. So now we're going to give it
a name, Customer clusters. All right, so now we can reuse this cluster in different
views if we need. All right, so now the next point is how we can edit our cluster. So now we have three clusters. How about we want to
change it to four? How we can do it? We will
go to the marks over here, right clickonets, and
here we have the option of edit. So let's select that. We will get again,
the same window, so in order to change
the number of clusters, we will not do it
at the data pain, we're going to do
it at the marks. This is how you
edit the clusters. Now if you go over
here again and click right to click
on the Clusters, you can find we
have another option called describe clusters. So here we're going to
find more information about our clusters. Let's select that. So
as you can see here, we have a lot of information
about our clusters. So first we have the input for the algorithm or for the
clustering algorithm. The variables are
the measures that we use in our view, the
sum of rough it, the sum of sales,
and the next info is the level of details Usually
here we have the dimensions. We are using. Now
the lowest level of details, the customer ID. Since each data point
represent a customer, then we have more information
about our clusters. So the number of clusters
we define are three, The number of data points, the number of customers,
we have 800 customers, and then we have the
table over here. For each cluster we have
informations like the number of items or the number of data
points inside each cluster. In the cluster one, we
have around 617 customers. In the cluster two we have 171, and cluster three is the lowest. We have 12 customers. The centroids of each cluster, the central points of clusters. If you need more statistics
about our clusters, we can find it inside
describe clusters. Really fun to work with
the clusters and I found different people use
different designs on how to present the clusters. For example, one design that
I see almost everywhere, That's if you go
to the shapes over here and then choose
the field circle. Now if you have a
lot of data points, what is interesting is that to see the overlapping
between those points, but now it's really hard
to see it in this view. So what I'm going
to do with that, I'm going to focus about
those data points. Let's select those stuff. And then we're
going to say, okay, keep only. Let's click on that. We have now like zoom
in in those points now in order to show
those overlapping in better way in bitter visual. What we're going to do,
we're going to go to the colors and then we're
going to reduce the opacity. Let's reduce it to something like 70% I think
it should be fine. And now our visualization
will just look really professional and you can see the overlapping
between data points. All right, so there
is another design in that to assign a shape
for each cluster. So before we do that, I want to have again,
the big picture. I will remove the filter, so let's just remove the filter from here to somewhere else. And with that we are
back to original view. So what we're going to do with
that, we're going to take the cluster and put
it on the shapes. So let's track and drop the cluster on the marks
over here on the shapes. So as you can see, for each
cluster we have a shape, we have the plus,
square, and circle. And if you want to
assign different shapes, what you're going to do
is click on the Shapes. And now we can go over here and change the shape of cluster. Let's say instead of loss for the clusters three,
we're going to have X. And let's click okay. And now instead of
flaws, we have X. This is how I usually design
the clusters in Tableau. Alright, so now after
we create the clusters, it's really important
to interpret the outcomes of
the clusters with the business like
in one hand we have the red cluster focus on the customers with
the high profits. And in the other hand, we have the blue cluster focus on the customers with
the low profits. Clustering your customers
based on the sales and profit can help you to gain insights about
your customers. Which can help the
business to target its marketing strategy
very effectively. Al, right now we have the
following task for you. The task is to identify
the top selling product by clustering the products based on the quantity and the profits, create five clusters using
the big data source. You can pause the video
right now to do the task, then resume it
once you are done. All right, so now let's create the cluster for the products. Here we need two measures. We have the profit
and the quantity. Let's have first the profits. We can drag and drop
it here on the rows. And then we're going to take the quantities on the columns. And now we need the
dimension to define the level of details,
the data points. And here we can use either the product ID or the product name. So I will go now for
the product name. So drag and drop
it on the details. All right, so now
we have everything. We have the measures
and the dimension, and we're going to go
and create the cluster. We go to the analytic spin. And then we take the cluster, drag and draw it over here. And Tableau did create
here only two clusters, but the task says five clusters, so we're going to go over
here and define five. All right, so that's
it. Now we have five clusters for the products. Let's close this clustering. The product space on the
quantity and the profits can help you to gain insights
about the product portfolio. And the business can
use it for many staff. For example, to optimize the inventory
management and make strategic decisions about
the product developments and marketing. This
is really amazing. All right, let's summarize. The cluster group in Tableau
is a statistical technique to group up similar data
points together in clusters. The cluster algorithm
used in Tableau is the key means easy to implement and as well
easy to understand. Clustering in Tableau is
one of the main features and very powerful since
Tableau is the only to, the only I tool that can plot endless amount
of data points. Because other BI tools like
power BI do always like make limitations on the number of the data points that you can
see in the visualization. Which can make it
really impossible to create clusters in power BI. Data clustering in
visualization is a very powerful tool
for data analyses and batter recognitions to help the business organizations
to be data driven, which means to make better
decisions using the data. All right, so that was it
for the cluster groups. And next we will learn how
to split the values of dimension into two subsets
using the Tableau sets.
104. Tableau | Sets: On how to group up the members, the values of
dimensions into groups. By the times we're going to
use the sets in Tableau, it is very similar to
clusters, as usual. We're going to start
first with the concepts, then we're going to learn how to build it in Tableau.
So let's go. All right, so now
let's say that we have the following data points
in our visualization. We can use datasets to
group up those data points. Sets can divide
your data based on specific criteria or selection
into two groups of data. The first group, we call
it the group, This group, you're going to find all
the data points that are included in the
subsets of data. These data points are
the members of the set. And the other group
is the out group. This group contains all
the data points that are not included in the
subsets of the data. That means the data points in this group are not the
members of the set. The sets in Tableau divide
our data into two groups, the in and out groups. When do we need sets
and why it's important? Well, we can use the
subset of data to do focus analysis on
specific scenario. And as well to
compare the subset with the remaining data. For example, we can
make a subset of the top ten customers in our
datasets based on the sales. And compare the subsets with
their remaining customers in order to understand
their behavior and what makes them on top ten. So it's really amazing feature
in Tableau to understand your data and to make focus analyses on
specific scenario. And in Tableau we have different
ways to create the sets. The first to create a fixed set, and that's by using
a manual selection. And the other way is to create a dynamic set based
on specific criteria. Here we have two ways to
create the dynamic set, either using condition or
using ranking top or bottom. Now, the last methods
of creating sets in Tableau is by
combining two sets. It can create new combined sets. Since we are combining data together, it's like the joints. Here we have four options, inner left, right,
and full join. Here the output can be
new combined sets that those are the different methods in order to create
sets in Tableau. Let's have quickly
some simple examples in order to understand
those methods. All right, so now back
to our five customers, and now we're going to
create different sets using different methods. We're going to start
with the first set. It's going to be fixed sets
using a manual selection. Here we're going
to go and manually select which customers are inside the subsets and
which customers are outside. Here we're assigning
two values in and out. For example, we're
going to say John is inside the set
and as well bet. But there is going
to be out Martin, George, and Maria going
to be outside of the set. As you can see, we just manually selected which customers
are in the sets. So let's move to the second set where we're going to create a dynamic set using condition where the sales
is bigger than 400. So here we will not
select anything manually. We will just define
the rule for Tableau. And Tableau going to do
it automatically for us. Tableau can hear
all the customers and start assigning
the values in and out. The first customer is Maria, does not fulfill the condition, so it's going to
be out of the set. Next we have the
second customer, John. He has high scores or 900
it fulfilled the condition, so he is a member of the set. The same goes for George, 750, Martin as well, but Peter
don't have any score, so he does not fulfill
the condition. He will be but Peter
don't have any score, so he does not fulfill the
condition. Peter is out. So using this condition, we have three customers
in and two are out. Now what make dynamic
sets very important and efficient at let's
say in the next days, those scores of the
customers did change. What going to happen after
your ratio data in Tableau? Tableau going to recalculate
the condition and assign new values if something
changed So there is dynamic and everything going
to be done automatically. Now let's move to the third one. We have dynamic
sets and now we're going to use the
top two customers, which means the top two
scores is going to be inside the subsets and
there is going to be out. If you have a look at the data, you can see Joan and George has the highest scores
between the customers. Those two customers
going to be in. The rest going to be out. Again, everything here
dynamic and automatic, We just specify the rule and Tableau going to do
the rest, all right? Okay. So those are the three
methods to create a set. Next, we're going to
go more advanced, where we're going
to create a set from combining two sets. Here we're going to take
the following example, where we're going to
create a new combined set by combining set
one and set three. Here it's really
important to understand that the calculation of this new combined
sets can be based on the output from the
set one and set three. Tableau will not check
the table customers, it's going to check only
the output from the sets. And here we have to configure the combined sets and
we have four options. It's something similar
to the joints, but not exactly like the joints. So let's go through those
options one by one. The first option says all
members in both sets. That means the
customer is going to be a member of the combined set if the customer is at least a member of one
of those two groups. So let's check our customers. Maria is not a member in
set one and set three, so it's going to be not as well a member of
the combined group. And the next customer, John, is a member of both groups. So that is more than enough. So he's going to be as well a
member of the combined set. And George is a member
of one of the sets, so he's going to be as well. In Martin here again
is like Maria. He's not a member of
set one and set three, so he's going to be as well out. Then the last customer better, he is a customer of one
of those two groups. That's going to be enough to be a member in the combined sets. As you can see with this option, it's going to be enough for the customer to be a member of one of the two groups to
be in the combined group. All right, now let's
move to the next option. It says shared
member in both sets. That means to be a member
in the combined sets, the customer should be
a member of both sets. It's not like the first option. It's enough for the customer
to be one of the sets. The customer has to
be in both sets. Let's check our customers. Again, Maria is not a
member of both sets, so Maria going to be out. But next we have
the customer, John. He is a member of both sets. So that means he fulfilled
the requirements, be a member of the
combined set as well. So now, as you can see, for
the other three customers, none of them fulfill
this requirement, so that means none
of those customers going to be inside our set. Well, this option is
very restrictive. All right, so now let's
move to the next one. It's going to say set one
except shared members. So what this means, we can have all the members
from the set one, but they should not be a
member in the set three. So let's check the customers. Maria is not a member
in both of them, so she going to be out. And now we come to John. John is a member of the set one, but he is as well a
member of the set three. Well this time John
will not be a member of this group because we are
saying except shared members. So that's mean John this time
gonna be out the next one. George is not a member
of the set one, so automatically
going to be out. The same goes for Martin. He's not a member
of the set one. But now if you check Peter, he is the only one that's
fulfilled the requirements. Peter is a member of the set one and not
member of the set three. And this is exactly the
requirement for this group. So Peter going to be a
member of the set three. And this is exactly the
requirement of this option. So only Peter going to be
a member of this group. All right, so now let's
move to the last one. It's exactly the opposite. So it says set three
except shared members. So the requirement for the
customers to be a member of this combined group is to be
a member of the set three, but not a member of the set one. All right, so now let's
check our customers. I really feel bad for Maria. She is not a member
of any of those sets. Like if your name is Maria,
I'm really sorry for that. It's not intended but now
it's really too late. I already recorded,
so sorry for that. Next time, I promise you I'm going to make better examples. But for now, Maria is out
as well in this group. The same here goes for John. John is a member of set three, but Joan is as well
a member of set one. So he does not fulfill the requirements
John gonna be out. Now if you look
to the customers, George is the only one in the set three and not
in the set one, so only John gonna be in this group and the
other two are out. Alright, so with that, we have
covered all the scenarios, all the methods that we
have, the Tableau sets. All right guys, so
now let's see how we can create sets in Tableau. We can create it in
the worksheet page, we cannot do it at
a data source page. And we can do it either at
the data bin or in the view. So now we're going to
create different sets using different methods. But first let's create the view. So we need the customer ID. By the way, instead
of drag and drop, you can double
click on the field, and it's going to be in the rows we need as well, the first name. Click on the first name, and we would like to have
the scores as well. So drag and drop the
scores at the ABC. So now we're going to create the fixed set using
manual selection. In order to do that,
we're going to go to the customer ID over
here on the data bin. Right, You click on it
and then we go to Create. Over here we have sets. As you can see, the sets
has at the icon of joints, but it is not joints. It has just the same simple. Let's click on that. And
now we have a new window. Let's see, what do
we have over here? We have first the
name of the set, let's call it Set one and fixed. Now we have over
here three tabs, general condition and tops. As you can see, those are the different methods of
creating sets in Tableau. The general tab is actually
the manual selection, the condition, as you
know, the dynamic set. And the top as well
is a dynamic set. Now we're going to go
with the first one. We're going to start with the
general manual selection. The middle, we have a list of all customers in our datasets. And we have to go and
start selecting manually which customers are in and
which customers are out. In our example, we selected
the customer two and the customer five to meet
the members of the group. And anything that you are not selecting is going to
be on the out group. So that sets the
customer 134 are out. Let's go now and click Okay. Now let's see what happened on the data Bain. We
have a new field. It's going to be discrete
dimension and since it's set, it has the following icon. As I said, it's like
the icon of joins. Now let's see the values
inside this field. Let's drag and
drop it over here. And now as you can see, we have only two values out. It's like bullion data type. We have true and
false here as well. In the sets, we have
only two values. We selected the
customer two to be in the set and as well the
customer five to be in the set. The risk going to be out. This is how we can
create sets in Tableau using manual selection and
it's going to be fixed. All right, so now we're
going to go and create a dynamic set using condition. Our example was the customers
with score higher than 400. Let's go again to the left side. Right click on the
customer ID, go to Create, and then to Set,
let's call it now, set two and we're going
to call it condition. Since we are making
now a condition, we're going to go to the
tab condition over here. So now we're going
to go and specify for Tableau the rule to decide which members are in
and which members are out. The rule says score
higher than 400. Let's define that first. We have to select this by field. Our field is a score
which is correct. And then the operation
over here is not equal, it should be higher than 400. So we have to specify
the value over here. And that sets if the
score is higher than 400, the customers going to be in. Otherwise, it can be out. Now let's go and click Okay. And as you can see, we
have another dimension on the data pane called
set two, double click. So let's check the values. The score over here,
350 which is out, 900 in, 750 in 500 in, and null, it's out. As you can see, it's really easy to define the dynamic set we have just to provide a rule and Tableau and do the rest. If tomorrow we have
different data, the Sit member going to change. Now we're going
to create another dynamic set using the rank. In our example, we had the top two customers
going to be in and the rest is
going to be out. Again, we're going to
go to the data pane. Click on the customer ID, create the sets,
let's give it a name. So it's going to be
Sit three and Rank. So now we're going to go to the third tab over here to the top. Let's go there for this example. We're going to use the
score to rank the customer, so the highest two
scores can be in. In order to do that,
it's really easy. We can define it here by field. Here in ranking we have top
or bottom as you can see. So we're going to
stay with the top. Next, we have to define
what we are selecting. Top two customers, top
ten to five to 20. So here we have to go with
the two and by score, so we are using the score,
everything is correct. And that's it, this is
how we define the rule. And Tableau going
to do the rest. It's really logic if
you just read it. Top two by score. All right, that's all.
Let's go and select. Okay, again, as you can
see we have the set over here and the data
being able connect. Now let's check the data. As you can see, John and George, they have the, that's
why they are in, and the rest, they are out. As you can see, sets are
really easy in Tableau. All right, so now
we're going to go and make it a little
bit complicated, where we're going to
create combined sets. We're going to go and combine
set one with set three. In order to do that,
we're going to go again to the data bin, but this time we're going
to start from the set. Let's go to the set number
one, right click Connect. And then we have here
an option called Create Combined Sets.
Let's click on that. As you can see, we have here a new window for
the combined sets. First, let's give it a name. So it's going to be
set four and combined. First, we have to define
the two sets we have. Here's the set one, since
we started from it. And then on the right
side, if you click on it, you will get a list of all sets available in the data bin. So we have the set
two and set three. We're going to go
with the set three. All right, with that, we have defined which set is
going to be combined, but now we have to define for Tableau how the data
going to be combined. Here we have four options. The first one is going to be
all members in both sets. The second one only the
shared members on both sets. And the next one is going
to focus on the set one, and the last one is going
to focus on the set three. For this example,
we're going to go with the shared members in both sets. Let's go and select that. And as you can see
here between the sets, the icon did change as well. All right, so now
everything is ready. Let's click Okay. So here again on the Data Bain
we have a new field, new dimension. Let's
see the results. I'm going to go and
double click on it. Now let's see the results. We are combining the set one over here with the set three. If you go and search
for the shared member, it's going to be only the
customer two since it is in, in the set one and as
well in in the set three. As you can see, we have
only one member in the combined set and that
is the customer, John. Because it is the
only shared customers between the two sets. It's
really not that hard. You just have to pay a
little bit of attention to which combining
option you are using. All right guys, so far we
have learned how to create the sets from the databain
using different methods. Next we're going to
go and learn how to create the sets directly
from the views. All right, so now we're going to go and create a new view. And it's going to be something similar to the cluster group. So we're going to have the two measures,
profit and sales. So let's go and select them. So double click on the profits and double click on the Sales. We have now the two axes, what we are missing
now the customers. In order to add the data points, we're going to go
to the customer ID and double click on it. So now we have our view
and we're going to go and create the set directly
from the view here. It's very similar to the groups we're going to go and select. Which customer is going to
be the member of our set. So in this example,
we're going to go and select the customers with
the high performance. All what you have to do is To select like this. Let's
go for those customers. And again, here we
have this new window. Last time we have
created a group, but this time we're
going to go and create a set from
those customers. So click on Out, and then we have to select this Curet set. So let's go and select it. So now we have a new
window, and as you can see, we cannot define conditions
or any dynamic set. It's going to show us a list of all customers that we have
selected in the view. And the only thing that we
can do over here is to check, did you select all the
customers correctly? And if we've done any mistakes, we can go and remove
the customer. Now let's give it a name, I'm going to call
it Set Customers high performers.
That's all for now. We're going to go and hit okay, so let's select that now. As you can see, nothing
changed yet in our view. We have now a new field on
the data pine called set. So we just created a new
set directly from the view. Now quickly I want to
show you something. If you are selecting
group like this and let's say the
window here disappears. What you can do, you
can go to any of those data points,
right click on it. And then here the last
option is create set. This is another
way how to create a set directly from the view. All right, so now
we have the set. And you might ask me, okay,
what you can do with it? Well, we can do many
things with the set now. So first we can highlight
it in our view. In order to do that,
we're going to take the set from the data pane and let's just put it
on the colors quickly. See which members are in and
which members are out here. As you can see, table
always use the color of gray for the members
that are out of the set. Of course you can change
that by going to the Marks. So if you go over here, then
we go to the Edit colors. And you can define over here the color of in and
the color of out. But for me now, the
colors are okay. So let's click Okay. With that, you are highlighting subsets of your data
for the end users. All right, so the other
use of the sets inside our view is that to focus
on specific subsets, currently we are showing all
the customers in and out, how to filter the data only for the customers that are
member of the set, only for the group. In order to do that, we're
going to go to our set. Right click on here, you can find two options. As you can see by default
we have show in out of set. That means we are
showing everything. But now we have another option called show members in the set. That means we're going to filter the data and we're going to show only the members inside
our set, the group. Let's go and select that
and see what can happen. As you can see now Tableau, remove all the customers
that are outside of the sets and we can see on the view only the
members of the set. This is really quick
way on how to filter your data and to make a
focus and specific scenario. But now you might
say, you know what? Let's give this
option to the users. Let's have the audience
that the users decide in which subset
they're going to focus on. This is going to
make your view more interactive and dynamic in that we can offer
the set as a filter. So let's see how we can do that. First we have to show all
the data points in our view. So we're going to
switch that Pac, let's go to our set right click on it and
we're going to go and select Show in out of
the set, show everything. So it's select that. Next we can offer
the set as a filter. So go to our set again,
right click on it, and here we have the option of show filter. Let's select that. Now as you can see
on the right side we have the two options
in out and all. So now we have
different scenario. If the users wants now to
see the whole big picture, all customers, they're going to leave the filter as it is. But if we have
different scenario where they want to focus on the subset of the customers
with the high performance. All what they have
to do is to di, select out and the filter. So let's go and do that.
And now as you can see, we are focusing on the subset of the group in only the
members in the sets. And for some other reasons, another users want to focus on the groups that are
outside of the sets. Maybe to understand the
behavior and so on. So they're going to diselect
the in and select the out. So now we are focusing on the group that are
outside of the sets. And again, if you want to
see the whole big picture, you're going to
select both of them. So I really prefer to give
this option to the users to decide which subset they're going to select and
they're going to focus on, because with that
you are covering many scenarios in only one view. All right guys, so now
with the sets in Tableau, we can go step further. We're going to give
the full dynamic to the users and
they're going to have the option of defining which customer is going
to be in the set. Because so far what
we have done is that by creating the views, we defined everything we defined which customer is going to be in and which customer
is going to be out. But now instead
of redefining it, we're going to give the options the full dynamic of
defining the whole set. So let's see how we can do that. In order to make the set
dynamic and interactive, we're going to add an
action to our worksheet. I will dedicate
later full tutorials on the actions and the
interactivity in Tableau. But now let's just learn how
to add a action for sets. All right, so in
order to do that, we're going to go
to the main menu in Tableau, to the worksheet. So select that, and then here, actions in Tableau.
Let's go there. Now, I will not go in details explaining all the options
that we have in the actions, because here we have way more than sets, We have
a lot of things. So now just follow
me, we're going to go to the add action over here. And then we have the option
here, change set values. So that means the actions of the users going to change
the values in set our set. So let's go and select that. Now we have to give
an action name, so we're going to call
it action change sets. And now we can select in which worksheets this action
can be applied. So now if you go over here, you can see the list of all sheets that we have
in our whole work. So now I want to apply
this action only on this worksheet, so
everything is fine. And now here we are defining
the behavior of the user. So now the question is, when the action going
to be triggered, Either by hovering in the mouse or by selecting
the data points, or by drop down a menu. So I will stay with the default. Let's have the user clicking
on those data points. All right, so now we're going
to define the target set. Which set is going to change
once we do the action? So let's see what we have here. So as you can see, we
have two data sources. In the tutorial we created, in the small data
source three sets. And in the big data source, we have created only one set. Once the action is triggered, the values of this set
should be changed. So let's select that. And now we are coming to the
interesting part. But first subcafe, Okay, so here we have two types
of actions with the mouse. So first, let's
check the left side, what can happen when we
select a data point. The first option going to
say assign values to set. So that means it's
going to create completely new set from
what you selected. The second option is
add values to set. So table going to hold
the old values and everything that
you are selecting can be added to the set. The last option is
anything that you are selecting going to be
deleted from the set here. It's really depends
on how do you want the users to interact
with the view. Either you want them to
create completely new set, so you're going to go
with the option one. Or you want to redefine
a sets and you want them to extend it by adding
new members to the set. So you're going to
go with the option two or you want the users to start removing members from
the pre existing sets. I would say let's go with
the option two where the user is going to add
members to pre defined set. All right, so that is
for the left side. What can happen once the
user start selecting? And on the right side,
what can happen once the user starts moving
away from the selection? So here the first option
is to keep the set values. Second is to add all
values to the sets. So that means once the user start moving away
from the selection, all the members,
all the customer is going to be in the in group, it's going to be inside the set. And the third one is
exactly the opposite. What's going to happen? All the data points going
to be outside of the sets. So I think both of
them are extreme. We can leave it as it
is keep set values. So now let's keep those
options and let's see what can happen in the view
once we start selecting. So let's go with okay, so as you can see here we have our new action.
Let's click okay. Now let's go inside the view
and start selecting stuff. But before that,
I want to change the shape of those data
points to be more clear. So let's go to shapes and
use the field circle. All right, so now I'm
not selecting anything. Like if I move my
mouse over here, you will see nothing
going to change, but the action
here is to select. So to click on the data
point, let's click on that. Let's move away. So now we
can see this member is blue. That means it is in the set, and anything I'm clicking on those data points can
be inside our set. Or we can go over
here, for example, and select all those
stuff at one time. Now anything that I'm selecting, the view as you
see, it's going to be included in our set. With that, we are going full dynamic and we give
the option for the user to define which customer is in and
which customer is out. All right, with that, we have covered everything
about the sets. How to create it as a fixed
dynamic from the data bin, from the view, how to
add actions to it, how to add it to filters. This feature in Tableau
is really great. All right, now let's summarize
the sets in Tableau. Going to divide
your data based on specific criteria or
selection into two groups. So we have the subsets, it's going to contain all
the members inside the sets. And the out subsets, it's going to
contain all members that are not
included in the set. The sets is very
important feature in Tableau since it's
going to allow you users to focus on subsets of your data and to compare it
with the remaining data. And sets are a great way to add dynamic and interactivity
to your views by giving the options
for the users to define in which subset
they're going to focus on. All right, kay, so that's
all for the sets in Tableau. And next we will learn how
to group the values of the measures using pens and how to build
histograms in Tableau.
105. Tableau | Bins & Histograms: All right guys, So far we have
learned different methods on how to group up the values
of dimensions into groups. But now we will learn
how to group up the values of
measures into groups. And for that, we can learn
the pins in Tableau as usual. Let's first understand the
concept behind the pins, and then we can learn how
to build it in Tableau. Let's go all right guys, before as we learn
dimensions and measures, we learn the secret formula
of building new views. And that is measure
by dimension, like sales by category. We have to build view
from two measures. So it's going to be
measure by measure, like profit by sales, quantity by profit, and so on. One way to do that is by converting one of those
measures to pens. So we will have profit by sales pens and quantity
by profit pens. So what is Benz pens? Divide the data into groups
of equally sized containers, resulting in systematic
distribution of the data. And we can use those pens to create charts called histograms. Histogram going to classify your data into
different pens and then counts how many data points do we have inside
each of these pens. In histograms, we usually use the part chart to
visual the data. All right, so now let's have
an easy example in order to understand the
pens and histograms. All right, so now let's
have the following data. We have ten customers
and with their scores, the scores are like points
that the customers collect. And now we want to count how many customers fall
within a range of scores. For example, how many customers
do we have in the range 0-303060 and so on? So first we have to create pens. In order to create pens, we need few informations like what is the highest
value in the scores? So it's going to be the
first customer, the 63. And what is the lowest
value in the scores? It's going to be the zero.
The next value that we have to define is the
size of the pin. For example, here we're going
to take the size of 30. And now we have all the
information that we need in order to
create the pins. Don't forget they are equally
sized, what that means. The first pins that
we have is 0-30 It's starts with
the lowest value of zero and the size should be 30, that's why we have the range
0-30 This is our first pin. The next one going
to be 30-60 Again, as you can see, the size is 30. And now the last pin going
to 60-90 And with that we're going to start because
with the last pin we the highest value. So with that we
have created from the measure score and
equally sized pens. And now after we
created our pens, we're going to go and
count how many customers, how many data points do
we have inside each pen? All right, so now
let's start counting the customers for each pen. Our first pen starts
0-30 so let's see, how many customers do we
have inside this range? So the first customer is
out, will not count it. The second one is
inside the range, so we have one customer, two customers, three customers. This customer is out of the
range, the same over here. So here we have the
first customer, this customer is out. We have the customer number
five, and that's it. So we have five customers
between the 030. All right, so now let's
move to the next pin. How many customers
do we have that their score is 30-60 All right, so now let's start counting
and scan our table. I think all those
values are out. We have this customer that
is inside this range. Then we have the
45, and as well 55. So we have four customers, their score 30-60 so
this is our second pin. Let's move now to the last pen. So we have the
range 60-90 And now let's count how many customers do we have inside this range? So we have ten customers. We have already nine,
so I think we have only one and that is the
customer number one. And all other values
are not in this range, so we have one customer
and that's it. With that, we have created
a histogram for the scores. We just have to create
the pens and count how many data points are
inside each of those pens, and we call those
blue parts as pens. And each pen has a size. Now let's say that
we want to define another value for
the size of the pen. And we take the value
ten. So what can happen? We can have more pens, so the first one
going to be 0-10 The next is ten to 2020
to 30, and so on. So it makes sense if you define smaller
size for the pens, you will get more chunks from the data instead of
having three pens. Now we have seven pens, and as you know, after
creating the pens, we can account how
many customers do we have inside each
of those pens. If you go and start counting, you can have the
following histogram. As you can see, what is
defining the score is the lowest and
highest values inside our data and as well
the size of the pens. As you can see, using the pens, we created different
groups from a measure. Now you might ask
me, why do we need histograms? Why
they are important? Well, if you compare the
table on the left side with the visual on the right
side in the histogram, you can quickly
identify trends and patterns in the distribution
of the customers. Like you can see
quickly that most of our customers have
the score 0-30 This type of chart can help you quickly understand whether everything was okay or you have to improve
in certain areas. Define new strategies and make better decisions using the data. All right, now let's
see how we can create pens and
histogram in Tableau. And we can do that only
on the worksheet page. We cannot do it at
the data source page. And there's two ways
in order to do that. Either we create pens
in the data pane or we can create pens
in the visualization. Let's start with the first one. So now we're going to
create a histogram for the customer scores. And we're going to
stay with the big data source on the left side. We're going to go to the data
pane and we need the score. Right click on it. And
then we go to Create. And here we have
the option of pins. Let's go and click that. Now we have here a new window
to create the pins. The first one we
have the field name. We're going to
leave it as it is. The second option here we have the size of pens
here as a default, Tablo going to follow specific
mathematical equation in order to find the
suitable size of pens. But if you don't
want this value, you can go and change it. So for example, let's go
with the value of 20. After that, we
found informations about the range of values. So what is the minimum value and the maximum value that we find inside the field score and the differences
between them? For now, that's all
we're going to have. The size of pens of
20. Let's hit okay. Now if you check the data
bin on the left side, you can find a new
field called score pen. It is a dimension because it has infinite number of values. The score going to stay,
of course as a measure. Let's check the values
inside our new field. So let's drop it
here on the rows. Now as you can see,
we have the pens and the size of each pen is 20. Okay. Now, so far we have
the pens from the score. The next step in order to make a histogram is to get the
count of the customers. Now let's use this measure, the customer count, Drag and
drop it here on the view. And then I have to
switch between them, so it looks like a histogram. With that we have our histogram, but we are not there yet. To make it look like
a real histogram, we have to have the
pens as continuous. If you check the score
pin on the left side, you can see it is a discrete,
it is a blue color. And now we're going to go and
convert it to continuous. Right click on it and convert
to continuous on that. And it's still on the
view as a discrete, so we have to convert it as well here and the view
as a continuous. With that, we have created
a histogram in Tableau. I'm going to add the
final touch where I'm going to add the
values for each pin. So we go to the
labels, show mark, label, and now I'm going
to change as well, the coloring in our histogram. So I'm going to
take the score pin and put it in the
colors. Let's do that. We are still not there. I would
like to have the pin with the highest number of
customers to be darker. So in order to do that, we're going to go to the customers it color and then we're going to go over
here and reverse it. Click Okay. Now I'm happy. This is how I usually present the histograms in the project. Once we have the histogram, we have to discuss it in
order to understand the data. Usually we search for
peaks for valleys, or any outliers that stands out. For histograms, there are different shapes with
different interpretations. The shape of our
histogram that we have called skewed to the right. Skewed to the right means that the histogram on the left
side has the highest peak, and then the frequency
of the data going to be descending as you
go to the right. And on the right side,
you're going to have the lowest frequency
of the data points, which is naturally
good in this example. That means we have a
lot of new customers that didn't collect
yet any points. The histograms are
really powerful to see the distribution of
your customers in one click to quickly understand
whether there are issues in your business or
if you find any new trends. So now for this example, we have decided that the
size of the pin is 20. Let's say that you
want to change the distribution and you want
to change the size as well. So in order to do that,
let's go to our field, right click on it and
then we go to the edit. So let's select that. And here we can go over
here and change it to ten. Let's click Okay. And
now as you can see, we have more pens and more
details about our data. So now you might
ask me, I want it to be more dynamic and I want to give the users the option of defining how many
pens do we have. And for this we can use another feature
called parameters, which is going to be
in the next tutorial. Alright, so now so far
we have learned how to create pens from the data pane. There is another way to create pens and histogram in Tableau, which is way easier
than what I showed you. We can do that directly
from the visualization. Let me show you what I mean. So let's create a
new work sheet. And let's say that
I want to create a histogram from the sales. So in order to do that,
we're going to go and take the sales and
put it on the roads. And then we're going to go
over here on the show me. And we have redefined visualization from
Tableau called histogram. So the requirement for
this visualization is only one measure. So once we click on that, you will see the
Tableau did everything. If you check the data
pane on the left side, we have already
been or dimension called sales pen with
the role of continuous. And of course Tableau going to suggest the size of the pens. You can go and change that of
course, but as you can see, it's really easy If we
just took one measure in the view and click
in the histogram, the rest is going to
be done from Tableau. And this is exactly the power of Tableau in the visualization. All right, so now let's have a summary pens going to
divide your data into equally sized containers
which going to result in systematic
distribution of the data. And pens are the method of
creating groups from measures. So that means we can create
pens only from the measures. We cannot create it from dimensions because
dimensions are already pins. And pins themselves
are dimensions. And it's better to convert it to continuous dimension to
be used in histograms. And one limitation in
Tableau is that you cannot create pins from
calculated fields. And the main purpose of having
pins and histogram is to quickly identify patterns and trends in the distribution
of your data. All right, Kay, so that's all for the pins and histograms, and with that we have
learned everything about how to organize and customize
our data in Tableau. And we are done
with this chapter. Next, we will learn in
Tableau how to filter your data using different
techniques at different layers.
106. Tableau | Section: Filtering & Sorting Data: Filters in Tableau. We have many different types of filters
for different purposes, like optimizing the
performance or as well for your users
to explore your data. That's why it's
very important to understand them and the
differences between them. So that's why first we can
start by understanding the concept behind the different types of
filters in Tableau. And then we can learn
the different methods on how to create all those
filters in Tableau. Moving on, we can learn
the many different options on how to customize the
filters in Tableau. And at the end,
I'm going to share with you many tips and tricks based practices of
using filters in Tableau that I usually
follow in my projects. So let's start with
the first topic where we can
understand the concept behind the different types of filters in Tableau.
Now let's go.
107. Tableau | Types of Filters: All right guys, the best way to understand the hierarchy
is to have an example. If you take a look at our data, for example, the customers, you can find some
dimensions are related to each other's since they
hold similar informations. For example, the
dimension country, we have values like
Germany, USA, and France. And we have another
dimension city, where you can find the cities
inside those countries. For Germany, we have
Berlin, Stuttgart. And then we have a third
dimension, Postal Code, where you can find the
codes inside those cities. As you can see, these three
dimensions are describing common information. They give us information about
the user location, and we can relate
those dimensions together using the hierarchy. In hierarchies, we
have different levels. And we start with the top node, and we call it the root node. This node represents
the highest level of aggregations
in our hierarchy. And now we're going to go to the next level of the hierarchy, where we have the country. In this level we're going to see more details about our data. Where we have, for
example, the two values, USA and Germany, and the
links between the nodes, we call it branches. And now we're going
to go to the next level in our hierarchy. We have the level two
here in the city. We will see more
details about our data. So in USA we have
Portland and Seattle. And in Germany we have
Stuttgart and Berlin. And again, we have
the link between the parent node and the child
node using the branches. And now we're going
to go to the last level in the hierarchy, we have the postal code. And here we're going
to split the structure furthermore with more details. So we have the following
bustal codes for each cities. Now, since the postal
code is the last level in our hierarchy and those value
don't have any children, we call those nodes
as the leaf nodes. The leaf nodes or the leaves, they represents the
most detailed level of our data in this hierarchy. So now with that, we have the complete
structure of our hierarchy. As you can see, it looks
like a tree structure. The top node, we call
it the root node, it represents the highest
level of the details. Then we have the
intermediate levels, and they are connected
using branches. And the last level, we
call it leaf nodes, where it represents the
lowest level of details. We have the root node, it represents the highest
level of the aggregations. Then we have intermediate levels connected with the branches. And then we have the
leaves, the leaf nodes. They represent the lowest
level of details in our data. As we learned before, we can do many lab operations on the cube. So if we have rake in our data, we can do two very
important operations, the drill down and the drill up. The drill down and drill up, they are all operations that's going to help
us to navigate through the hierarchy in order to gain deeper or higher level
understanding of the data. So let's understand first
how the drill down works. Let's say that we are working
with the Mejor sales. We start on the top node
on the highest level. At the highest level,
we're going to have the total sales in
the whole datasets. For example, it's
going to be 140. So now we are at the highest
level, at the root node. And if you use drill down, you're going to jump to the next lower level
in the hierarchy. So that means at this
level we're going to see more details
about the sales. So for USA we have 90, and for Germany we have 50. And now if you want to see
more details about your data, we can apply again, drill down in order to jump to the next lower level in the structure. So
what's going to happen? We're going to go to the level
two and here the sale is going to split between
Portland and Seattle. We have 40.50 and for Germany, we're going to have 24
suit guards and 34 Berlin. So that means we are seeing
more details about our sales. And now if you want to go to the lowest level to the leaves, we're going to drill down
from the city to postal code. So it's going to look like this. The Portland gonna split
between those two postal codes. Say Seattle going to be the same because we have
only one child. The same for Stuttgart,
it's going to stay 20, and Berlin, we have
two postal codes, so it's gonna split again. So as you can see we are using drill down to navigate through the hierarchy by taking us from higher level to lower
level of details. It's like we are
expanding the tree to see more details to
understand our data. All right, so now we're
going to talk about the second Alp
operation, the drill up. It's exactly the
opposite of drill down. Drill up gonna take us
from bottom to top, lower to higher level of
details how it works. Let's say we're going
to start at the leaves and we're going to have
the sales of those leaves. And now we can use a drill up to move from the postal
code to the city. For example, we're
going to have the total sales in Berlin, 30, because it's the
sum of ten plus 20. And then in Utgard going
to stay the same, 20, Seattle 50, and
Portland as well, going to sum up the
values from the leaves. So we're going to
have the value of 40. As you can see, as we
are moving higher, the value is going to
get more aggregated. Let's see that we want
to jump to the country, so we can use again, a drill up to move from
the city to the countries. Germany, we can have
the total sales of 50. For USA, we can have
the total sales of 90. Now you can use, again,
drill up to go to the root node where you can have the highest level
of aggregations. So we can have the value of 140, the total sales
inside our dataset. As you can see, if we have
a hierarchy structure, we can use a drill up and drill down to navigate through
the hierarchy structure. Hierarchies organize and
structure the member of the dimensions into a
logical tree structure by grouping similar
dimensions together, Hierarchies are really important and give dynamics to your views where you can have
the big picture and understand the data
at the highest level. And you can drill down to specific details to gain
deeper knowledge data. All right, so now we
are back to Tableau. Let's understand how we can create hierarchies in Tableau. We can create hierarchies
only on the worksheet page. We cannot create it at
the data source page. In the worksheet
page, we can create hierarchy on the data pain page. If you take a look to
the customers tables, you can find that we
already have a hierarchy. And here we have small icon that indicates we have hierarchy, the hierarchy name
called Country City, and on the left side over
here we have small arrow. If we click on it, the
hierarchy can expand and we can see the dimensions
inside this hierarchy. Speaking about dimensions, hierarchies could be used,
only four dimensions. You cannot create a
hierarchy from measures. And this hierarchy that
we have over here, it is created automatically
from Tableau. Since Tableau analyzed the
content of the country and the city and automatically understood that there is
a hierarchy between them. But since we want to learn
how to create a hierarchy, we're going to go and remove it and create a new one
from the scratch. Now in order to
remove a hierarchy, you go to the hierarchy name over here, right click on it. And then here we have the
option remove hierarchy. Here you have to understand
that the dimensions inside the hierarchies
will not be deleted, only the hierarchy
itself will be deleted. So you will not lose any
fields on the logical tree. The logical hierarchy
will be removed. All right, so now
let's see how we can create hierarchy in Tableau. And we're going to create
the location hierarchy. We're going to go to the
left side of the data in and we're going to select
one of the dimensions. It doesn't matter which one
you're going to select, but I prefer to start with the highest level
of the hierarchy. Here in our example,
it's going to be the country select
the country radical. Click on it. And then here we have something called hierarchy. And we're going to
select Create Hierarchy. Let's go there. We have to give it a
name, so we're going to call it location hierarchy. Then he, as you can see now on the left side we have the
icon of the hierarchy. Inside it, we have only one
dimension, the country. Now in our hierarchy, we have as well the city
and the postal code. So how we can add it
to this hierarchy? As we learn, the hierarchy
has different levels, and the order of those
levels are really important. We have country, city,
and postal code. Now, in order to add the city, we're just going to
drag and drop the city beneath the country over
here and release it. With that, we have now the
city inside our hierarchy. Let's grab as well
the postal code. So we have to drag and drop it beneath the city. Let's release. With that, we have created the location hierarchy
with the three dimensions, country, city, and postal code. Here Again, if you want to hide the details about
this hierarchy, we can collapse it over here. Or if you want to
see the details, we can expand the hierarchy. All right, so this is
one way on how to create hierarchy in Tableau
by using drop down. The second way on how
to create hierarchy, we can quickly drag and
drop dimensions together. So for example, if we go
to the product table, we have as well a hierarchy
here between the category, product name, and subcategory. Our hierarchy starts
with the category, then the subcategory,
and the last one, the leaves, going to
be the product name. Now let's see how we can
create the hierarchy using quickly drag and drop. We're going to take one
of those dimensions, let's say we're going to
start with the category, drag and drop it inside
the subcategory. So I'm now hovering and selecting the subcategory.
Let's release. Once we do that,
Tableau understand that we want to connect
those dimensions. So Tableau going to
create a new hierarchy. We're going to call it
the Product Hierarchy. And let's it, okay.
And now let's see. On the left side we
have a new hierarchy called product hierarchy
with the icon. And we have insided
two dimensions, category and subcategory. We are missing the
third dimension. Let's take the product name
and drop it in the hierarchy. Now we have problem with that. The order of the dimensions inside our hierarchy is wrong, because the dimension
category should be the level one and the subcategory
should be the level two. How we can fix that? Just select the category and drag and drop it on top of
the subcategory. Let's release that.
That says this is how you change the order
of the categories. And with that, we have
the product hierarchy. All right, now let's say that we want not to remove
the whole hierarchy, we just want to
remove one member, one dimension from
the hierarchy. In order to do that, let's say we want to remove
the product name. Select it and just drag and drop it somewhere here
in the empty space. And with that, the
product name is not anymore member of the hierarchy. So this is how we can remove
dimensions from hierarchy. But I want to put them back in our hierarchy because
we need it later. So I will put the subcategory
beneath the category, and we take the
product name and put it beneath the subcategory,
and that's it. So these are the two methods of creating hierarchies in Tableau, either by drop dominu
or by quickly drag and drop the dimensions together in order to create a hierarchy. It's really easy. All right, so now we have this
hierarchy, the structure, how we're going to use it inside our view, it's really easy. We're going to go and
select the whole hierarchy, then drag and drop
it to the View. So here the hierarchy
going to start from the level one
for the countries, and we're going to see the
values of the country. Now let's have one
of those measures. We're going to
take the sales and drag and drop it on the columns. So now if you look closely
to the country, to the plu, pile over here, you can see that we have a new sign,
the blast sign. This sign indicates that we can drill down in
this dimension. So now let's go and
click on the blast sign. As you can see, now we
are drilling down in our hierarchy to a lower level. Now we are seeing more
details about the sales. And we are now at the level of the city to the next level. Now as you can see, we
have the dimension city. Our rows, we didn't drag and drop it from
the database and put it at the rows it
expanded from the hierarchy. Again, here the city
has the plus sign that indicates we can drill
down inside the city. Let's drill down again. As you can see now we
are at the postal code and we can see more
details about the sales. Now if you check
the postal code, there is no plus sign, like
the city and the country. Because we are at the leaves, we are at the lowest level
of details in our data. With that, we have
navigated through our hierarchy from the
top node to the leaves. As you can see, it's really
easy and very dynamic. Now let's say that we are at the leaves and we
want to drill up back to the highest level of the aggregations
to the top node. It's really easy if you
check again the city and the countries we
don't have anymore, the plus sign we
have the minus sign. The minus sign indicate that we can drill up
in the hierarchy. So let's see what can happen if you click on the minus sign. As you can see, we drill
up now from the leaves, from the postal code
back to the city. And the values of those cells
are now more aggregated. And now the same
thing, if you want to drill up from the city
back to the country, we're going to click on the
minus sign. So let's do that. And with that we are
moved to the level one, to the highest aggregation
in our hierarchy. All right, so so far
what we have done is we drill up and drill down in our hierarchy using
the row shelves and you know that's the
rows and the columns. We use it as developers
build our view. Now the question is
how our users and the audience get and drill up and drill down
through the hierarchy. Because the hierarchy should
be as well used quickly from the users to drill
down to the details. Now let's see how
we can do that. If we go to the view over here
and hover on the country, we can see again a plus sign. Let's go and click on that. And as you can see,
we drill down in our hierarchy from the
country to the city. Now let's go more in details and drill down to the postal code. We can hover on the city,
and as you can see, we have again the plus
sign. Click on that. And with that, we drill
down to the postal code. This is exactly how the users
can drill down in the view. Now if we want to
drill up back to the higher level,
we can do the same. We can see the minus
sign over here. Click on it and you
go back to the city. And then we go to
the country as well. We have the minus,
we click on that. And with that, we drill
up back to the country. As you can see with those icons, we can navigate
through our hierarchy. Now you might say all your
users, you know what, this is really small icon
and my users don't like it. Is there any other way to drill up and drill
down in the view? Well, yes, if you go to any of those values over here
and write a click on it, you can see in this drop down, we have a drill down. If you click on that, we drill
down to the city the same. If you select any value,
doesn't matter which one, let's go over here and
then drill down again. And with that we are
at the postal code. If you want to drill up, you can do the same, any
values radically cone it. And here we have the
drill up socilic. And to drill up back
to the country, go to any values in the country radically
on it and drill up. So those are the
two ways on how to drill down and drill
up in the view. All right guys, so
far we have created our own hierarchies by putting those dimensions together
in different levels. But in Tableau we have as well indirect
embedded hierarchies in the data type
date in Tableau. Any field with the
data type date has the following hierarchy. It starts with the highest
level with the year, then we have the
quarter the month, and then the lowest
level, the leaves. We have the days.
Those four levels are the default levels inside each field with the data
type date in our dataset. Now we have another data
type that holds as well, an embedded indirect hierarchy. We have the fields with
the date and time. Here we have informations
about the time, and we have seven levels. It start exactly like the date, so the highest level is
going to be the year, then the quarter month,
and then the day. But now we can drill down to more details since we have
the time information. The next level is
going to be the hours. Then we have minutes
and seconds. Second are the lowest
level of details. They are our leaves here. We have civil levels
of the hierarchy. Date and date and time. They have hierarchy
embedded inside it. Now let's uncover those
hierarchies in Tableau. All right, so now
we're going to go to the table orders. And
here we have two dates. Doesn't matter which
one, both of them are going to have exactly
the same hierarchy. Let's take the order date, drag and drop it
here on the rose. Now, as you can see, we
have now the plus sign. It indicates there
is a hierarchy. And it starts at the highest
level with the years. Now let's take a measure
to see some data. We're going to take
the order counts and put it in the columns. And I want to show
Israel the labels. Let's show some labels. All right, Now let's go and discover the hierarchy
inside the date. As you can see on the left side, we don't see any information
about the hierarchy, so that means it's really
embedded inside this data type. So let's go on the years and click on the plus
sign to drill down. As you can see the
next information we have the quarter
informations. So now we see the total number
of orders by the quarter. So now we can see more details
about the total counts, and then we can drill
down to the day. And now we are at the
lowest level at the day. We cannot drill down
further, for example, hours, minutes and seconds, because the order date has
the data type date. As you can see, the dimension
order date has four levels, years, quarter, month and day. It's really nice to
have it like this in Tableau because it's
really standards. I worked with other BI tools and there we have to
build it in our own, which is really time consuming to build all those hierarchies. Especially if you
have a big dataset here in Tableau,
our life is easier. Tableau did decide to have a
hierarchy inside each date. All right guys, one more
thing about the arches. They really organize
and structure your views and make it more
dynamic for the users. For example, requirements
to make sales by country, sales by city, sales
by postal code, and you don't use hierarchies, you will end up making three views like here
on the left side, it takes a lot of space. And as well, it's
literally dynamic. But better than
that, we can create hierarchy between
those dimensions. And we can put
everything in one view. And then you give
the options for the end users to drill
down and drill up, depending on what they need. If they want the
sales by country, we have it already
at the top node. But if they want
the sales by city, all what they have to do is to drill down to the next level, and we have it already,
sales by city. If someone's need to go more in detail to go to the postal code, they can drill down as well
to the sales by postal code. As you can see, it gives really your view more dynamic
and going to be more attractive for the end users if you compare to
the lift sides. Now we have more dynamic, more interactive
for the end users. And as well, you are creating list views in your dashboards. So this is really great. If you want to drill up
back to the country, we can just click
the minus sign. Hierarchies gives more
dynamic its structure and organize your
data in the views. All right, now let's summarize. Hierarchies, organize and
structure the members of the dimensions into
logical tree structure. Hierarchies are special
feature only for dimensions. You cannot create
hierarchies between measures we can and drill up to navigate through our
hierarchy to gain deeper or higher level
understanding of your data. Overall, hierarchies are really important to organize and
structure your data interviews. And it provides for the
users a powerful tool to quickly and easily navigate
and explore your data, uncover insights, and
make better decisions. All right, so that's all
for hierarchies in Tableau. Next we will learn how
to group the members of dimensions into
hierarchategories using groups.
108. Tableau | How to Create Filters: All right, so now we
have the following task where we have to hide
sensitive informations. For example, let's say that the USA data in our dataset is sensitive informations
and we have to hide all the customers
that comes from USA. And now we're going
to go and build a view from the customers. We're going to take the
location, the country, and then let's say
we're going to take the profit from the orders. All right, so now as you
can see in the worksheet, we can see all the
countries including USA. So now we're going
to go and hide this sensitive information. In order to do that,
we're going to go to the data source page. And then here on the
corner on the top right, we can see filters and
we can add a new filter. So let's go and click on it. Then we will get a new window called Edit Data Source Filters. It's really easy
here. We're going to go to the ads, click on it. And then we're going
to get a list of all the fields that are
available in our data source. Since we have to hire
the customers from USA, we need the field country. So let's go and check that
over here. Then click Next. And here we got another window to set up the filter
for the country. So as you can see, we have all
the countries here listed. And now we can go and select the countries should be
included in our datasets. Or we can go over here
and click Exclude. And we're going to
exclude the USA. That means we are filtering out all the customers
with the country equals to USA.
Let's go and click. Okay. Now we can see over
here a quick information. So the filter is based
on the country and the details is saying we are
keeping the values France, Germany and Italy. So that's it. Let's click Okay. Let's go now and check the data
in our worksheets. So we're going to switch
back to our view, and as you can see, we cannot find any information about USA. And this can affect as well, all the worksheets that are connected to
this data source. So for example, if you go over here and create a
new worksheets, and we take the countries
track and drop it over here. You can see again here as well. We don't have the USA, we have the values France,
Germany and Italy. And with that we have protected this sensitive
information, right? Is more, we go to
another use case of the data source is to reduce the size of data inside
Tableau. This is very critical. If you have a bad
performance in Tableau, then you have to start
thinking about how to reduce the size of data
inside our visualizations. And the first step to reduce
the size of our data, we have to decide which fields we're going to use in
order to filter our data. A very common and usual
field is that we can reduce the number of years
inside our data source. Let's go and build a view. So I'm just going to go and
create a new worksheet. Let's take the order
dates to the rows, and let's take the
profits to the columns. And then let's make it as a part diagram and
show the results. As you can see,
we have inside of our data five years of data. This field is really good
candidate in order to reduce the data and you have to go and discuss it with your users. So we have to ask,
do we really need five years of data inside
the visualizations? Is it enough to have only like last two years
or three years? Let's say that after discussions with the users, you say it, the relevant data for
the visualizations is starting from 2020. Anything before is not relevant anymore for
the visualizations. We would like to have
everything starting from 2020. In order to do that,
we're going to go and build a data
source filter. Let's go back to our
data source page. We're going to go
again over here. So let's go to the edits. And then we're going to go and choose the field that
we're going to build, the data source
filter on top of it, go to ads, then we need the order date.
We have it over here. Let's go and select
it. Okay here, since it is a date, ask us fairs in which format you
want to build your filter. Since we are discussing
about the years, so we are interested
in the years. I'm just going to go with the
format years and go next. Now with that, we get a list of all years inside
our data source. Either you're going
to go and say, okay, I would like to include
everything starting from 2020 and not select
the old years. Or you're going to
say, you know what, I'm just going to exclude
the last two years, anything before 2020, so
you're going to go with the excludes and with that we
are removing the old years. I prefer this one over here
since let's say that we get 2023 data inside
our data source. You don't have to each time
to go and click on it. With that, we are
saying all the data are relevant,
starting from 2020. Let's go hit Ok. And with that, you can see inside our
data source filters, we got a new filter based on the years of order dates and
you can see some details. It says it keeps
2020, 2020, 1.20 22. With that, we're filtering now the data source paced of the
order dates and the country. Let's go okay. And as you can see
here, we have now two filters in the data source. Let's go back to our
view sheet seven. We can see that we have
only the data starting from 2020. All, all data are not presented anymore
inside our visualizations. Which is really great way in
order to reduce the stress and the size of data that
Tableau has to handle, that we are reducing
the scope of data and as well we are going to get
great performance in Tableau. So this is how we use the data
source filters in order to reduce the size of our data and as well to hide the
sensitive informations. But here, don't forget that
all the worksheets that are connected to this data source can be effected
with those filters. All right, so now we're
going to learn how to build a context
filter in Tableau. Let's say that we have
the following view. We're going to have
the category from the products and as
well the subcategory. And let's take for the
measure, the profits. So let's take it over
here and as well, let's change the colors. So we're going to put
it over here as well. So now in this view, we have all the categories furniture, office supplies, and technology. But the users want,
in this view, to focus only on the
office supplies. And for this specific view, all the other categories are
unrelevant affirmations. So they want only to focus on the office supplies by profits. So that means we want to
filter the data by category. In order to do that,
we're going to go to the category over here, hold control and put
it on the filters. And then we're
going to get again, the same window for filtering. And here you can see
the three values, furniture, office
supplies, and technology. For this view, we want
only the office supplies. So what we're going
to do, we can remove the others and leave the
office supply then hit, Okay. So as you can see now we
removed everything and we have only the one category,
the office supplies. The job is done, right? So we have the office
supplies part profits, and we filter the data. The answer is yes,
the task is done. But we are not using the full
power of Tableau Sincere. The focus is only about the office supplies and we are focusing on this
subset of data. We could go and reduce the whole datasets to
only this category. And with that, you can
win a lot of performance in Tableau because you are
focusing only on subsets, and all other data is removed
from this visualization. In such a scenario, we can go and use the power
of context filters. Now the question is how to make our filter as a context filter. As you can see now
in the filters we have our category, It is plupil. And it is as well as this filter type called
Dimension filter. In order now to promote
it to the context filter. As we learned before that we have specific order
of the filters, we have context, then dimension. All what we have to do
is to radically connect. And here we have the option
of adding to context. Once you do it,
you will see that our filter now has
the gray pill. The gray pills indicates that this filter is a context filter. So now you might notice
nothing changed over here, we have exactly the same view, but we optimized
the background in Tableau where we created
a Tumberal datasets. And it has only the
category of a supplies, so it's really small table compared to the
whole data source. All right, so now I
want to show you how Tableau process the
different types of filters. As we learned, the order of the filters are
really important. So that means the context
filter can be processed first, then the dimension filter, the context filter is dominating the behavior of the
dimension filter. All right, so now we're
going to go and add dimension filter in
our visualization. We're going to use the
subcategory in order to do that. Right click on it and click
over here, Show Filter. As you can see on
the right side, we have all those values that are included in the
office supplies. But in our original
data source we have way more subcategories as we are seeing now
from this view. And this is exactly
the effect of the context filter on
this dimension filter. We are seeing only the
values inside this context. All right, so now
we're going to go and change the definition of the context filter and see the effect on the
Dimension filter. Let's go again to
our Context Filter. Right click on it
and Edit Filter. Let's bring it here side by
side to our Dimension filter. We have only those values. And we have over here on the context filter, only the office. If we go now and include
as well the technology, let's apply and see that on the right side the value
is going to change. Let's go there. Now,
as you can see in the dimension filter
subcategories on the right side, we have more values than
before because we included in our context in our Tumberal
table, the technology data. We can go and change
the values around. Let's have only the
furniture check, the right side apply. And you can see we have only four subcategories with this. You can see that the
context filter is really dominating all other
filters below it. Understanding the
order of the filters, you can understand how Tableau works with those different
types of filters. So I'm going to bring the
context filter again to the office supplies and hit one more thing about
the context filter. As we learned before,
it is flexible. That means we can reduce the size of data only
for one worksheet. That means if you go to
any other worksheets you will not find here
any context filter. You can go and decide
for each worksheet whether you want to reduce
the size of data or not. Unlike the data source filter, where it can affect
the whole workbook, any worksheet that is
connected to this data source. With the context filter, we have way more flexibility. Now you might ask, can we use the context filter to hide
sensitive informations? Well, the answer is no.
Let me show you why. Let's have a quick example. Let's take the customers again. And we have the Country City, and let's take as
well the profits. As you can see over
here, we don't have the USA data because we have
the filter data source. And now let's say that the
data of Germany is now sensitive and we want to protect it using
the context filter. Let's go and do that.
We're going to take the countries hold
control and put it on the filters
and we're going to say we want to exclude Germany. So I'm going to
click over here on the Excludes and then hit Okay. As you can see now in the view, we don't have any information
about Germany and we go and promote the
country to context filter. So right click on it
and add to context. And now you might say,
okay, everything is fine. We don't have any information about Germany, so we are secure. Well, naturally,
there is still a way in order to see the
German data in the view. Let me show you how.
If you go to the city over here and let's
show it as a filter. On the right side, you will find all the cities from
France and Italy. So there is no cities
from Germany or USA, but here we have an
option on the filter. So if you go to this
small arrow over here, then we can go over here and see all the values
from the database. And we can explain all
those options later, don't worry about it. But let's go and
click over here. So now as you can
see, the filter is showing data about Germany. We have Berlin, we
have Stuttgart. So that means the data
are naturally protected. That means we are hiding the sensitive data
from the view, but still we can see all
the values from the filter. That's why never
use context filter to protect your sensitive
data or confidential data. Because even if we are seeing the data only in the filters, it's still exposing the data and the data is not protected. So that means if
you want to protect your data and hide the
sensitive informations, use only data source filters. All right, so now
we're going to move to the next filter in our chain. We have the dimension filter. We have already created some dimension
filter in our view. But now let's go in details and see all the
options that we have. All right, so now let's go to
the filters on the shelves. And you can see that we
have the subcategory. It is a discrete dimension, that's why we have
the color of blue. And now in order to
see all the options radically con it
and edit, Filter. And now you already
know this window, Let's just bring it over here to see the effect
directly on the view. So first we have
here different taps. The first one is
going to be about the manual selection and the rest is going to
be a dynamic filter. So here we have four taps, general wild card
condition and top. The first one is going to be the manual selection of the values. And the rest is going to be
like you are defining a rule. And the filter going
to be dynamic here. As usual, since it's discrete, we're going to see the list of all possible values
that we can see. And then you can go and manually select or deselect
values from this list. And as you can see on the
right side we have exclude. The default in
Tableau is included, so that means anything that
I'm selecting from this list, it's going to be
included in the view. And anything that
I'm not selecting, it's going to be excluded from the view in order to have
the opposite effects. What we can do, we
can click on Exclude. And now we're going to
have all the values that are selected
are crossed out. So that means they are
excluded from the view and everything that is not selected going to be included
in the view. So here it really depends. If you want to exclude only
two values from a long list, then it makes sense to
go and use exclude. So now if you go
and select Apply, you can see in the view
the remaining values are application,
Art and Benders. Tableau did exclude all
those values and you're going to have the same effect if you did select the excludes. And select only the
application Art and Benders. And in order to remove
our selections, we can remove
everything from here. So select none,
and we can reapply our selection on the
application Art and Benders. And as you can see, we're
going to have the same effect. So this is how you work with the manual selection at
the first tab general. But now let's move
to the next one. And before that I want to
include everything over here so we don't
affect the next one. So let's apply, and then
we go to the wild cards. So here we can work
with the wild card. If you have a dimension
with high cardinality, that means you have a long list of all possible values
in the dimension. And if you go and select
manually everything, it's going to be really painful. So instead of that,
we can go and define the rule if there
is a rule to define. So here we have like
an input field, we can write something
like for example, A. So here we have four options. The first one is contains, it's gonna means that somewhere in the world
there is a character A. And then the second
option we start with, it's gonna, means that the world going to start with
the character A. The next one is
exactly the opposite, it's going to end with a. Then the next one we
have exactly matches. That means the word should
contain only the value a. Let's start with the first one. If the word contains
a somewhere, then it's going to stay
in the visualization. Now as you can see,
all the words, words contains a somewhere. The application, we have it here at the start and at the middle. Art as well, at the start. And here we have it in
the middle and so on. Let's try out the second one. It's going to say if
the word starts with a, it's going to stay in the view. So let's just apply.
So as you can see, we have only two words
that starts with a. All right, so now let's
go to the next option. We're going to have ends with. But instead of A,
we're going to have any words ends with can stay in the view,
let's apply that. As you can see, all those
words ends with the character. Well, now you might ask,
is it a K sensitive? Well, it's not so
if you have a big, as you can see,
it's still Tableau. Go and select those values. Now let's go to the last one, it's going to be exact match. If you go over here
and select Ok, you will not see any data. But if you have exactly
labels and hit Apply, you will get only
one subcategory. It, is it a labels?
But we don't use it. Usually we use contains
or start with endswith. This is how the
white card works. Let's clear everything
in order to have the data we have it
contains and hit Apply. Let's move to the
next step. We have a condition in the previous materials with the parameters. We have already worked
with the conditions. And top here what
we're going to do, we're going to define a rule. And Tableau going
to go and check all the values and filter out all the values that are
not meeting this condition. So for example, if you
are checking our view, we have some minus values and the profits and we
don't want to see it. We will go and define a
rule that we want to see all the profits that
are higher than zero, only the positive profits. In order to do that,
we're going to select over here by field table. Going to show you immediately the measure that
is using the view, so we are using the
profit sum is correct. So we're going to go
over here and see the sum of the profit
should be higher than zero. With us, we have defined a
rule and let's hit Apply. As you can see, we
have just removed the subcategory that does
not fulfill this condition. That's it, This is really easy. We're going to move
to the next one, but first letter
reset everything. So we go select None. And then we're
going to hit Apply. In this tab, we can
define if we want to see the top ten products
or five products, or the lowest, or the
bottom five products. Again, we have to define
the rule four Tableau. And Tableau going
to filter the data based on our rule. Here
we have two options. Either we have the
top subcategories or the bottom subcategories. Let's go by field over here. And then here we
have two options, as I said, top and bottom. Then we can define is a top ten is a top five
or top parameters. As we learned before here, we're going to stay with
the same sense we are using the profit and that's it apply. And now we can see on the
view that Tableau did filter our view
based on our rules. So now we have the top
five subcategories. All right, so that's it. This is the different options on how
to filter the dimensions. I'm going to deselect
everything over here, and then we're going to go
to the mineral selection. And then it. Okay. Instead of redefining the rules
for the users, we're going to offer
the whole dimension as a quick filter
for the end user. And as you know, in order to
do that we're going to go to the dimension right click
rot and show filter. The user is going
to go to the quick filter on the right side and start selecting the values
that suits their needs. All right, so now let's
move to the next one. We have the measure filter, as we learned in the order chain is below the dimension filter. So let's, we can create
a measure filter. All right, so in order to
create a measure filter, we're going to go to
the sum of profits. Let's cold control, drag
and trope to the filters. Then we're going to
get a new window in order to configure
our filter. And since it is
continuous measure, Tableau going to ask us, do you want to filter
the original data, all values, or do you want to do the aggregations and
then do the filters? Since it's measure we have
the following aggregations, like sum, average,
median, and so on. Or if you want to
do only the filter on the original data, then you're going to go
and select all values. But since we have sum of profit, I would like to go with
the sum aggregation. Let's select that and
then go with next. Now we're going to
get a new window in order to configure
our measure. And here we have four options. Range of values. At least, at most and special. Since our measure is continuous, Tableau can be
presented as a range. It has a start and end. It's not like the dimensions
where we're going to get a list of all values
from the data source. We will get only aggregated data and we can configure
only start and end. In the first option,
we can configure the starting point
of the range and as well the end
point of the range. You can control both of them. In the next one, we can control only one of
them, Only the start. Here we can specify what is the minimum value that is
allowed in the visualizations. The next one is going to
be exactly the opposite. At most. We can define the end
point of the range. What is the highest value that is allowed in
the visualizations? Again, the range
of values we can specify the start
and the end at. We can specify only
the starting point. And at most we can specify only the end point of our range. Then the last one, the special, is about the null values. Here we have three
options, null values. If only you want to see the
null values from this filter, null values, that means
you don't want to see any nulls inside our
data or all values. You are allowing both of them. So as a default we
stay use all values. I'm going to stick
with that And I would like to configure both of the ends and the start
of our continuous measures. As you can see,
it's really easy. Let's go and hit, okay. And with that you can see
we've got a new filters inside our filters and it has
of course, the green color. All right, so first
we're going to go to our major filter and show
it as a quick filter. So radically connect
and show filter. And now we can see the
range on the right side. Let's just make it a little
bit bigger to see the range. Now as you can see we
have like start and end, but it is not completely
for the whole bar here. Table want to show you that we are not showing
all the values. We are showing only the
range of the subset. So now what can happen
if we take the end to the right and the
end to the left? Nothing can happen on the view. We can have exactly
the same data, but here we can
see in our range, there is different colors. The light part can indicate that if you change
the values here, nothing can happen in the view. As you can see. If I
just move it over here, the view will not be filtered. Now, if I start moving the
start inside the dark parts, you can see that now we have
now an effect on the view. The dark color in the slider is the relevant values and the light part is the
unrelevant values. All right guys, so now
we're going to talk about the last type of
filters in Tableau, the table calculation filter. It is the bottom of the chain. And you can see each type of filter is going to have
an effect on this type. All right, so now
let's learn how to build table calculation filter. And as the limbs suggests,
it is a calculation. And we're going to have
a whole section on how to create calculations
in Tableau now. Don't worry about
the details how to create calculations in Tableau, just follow me with
the steps now. All right, so now
we're going to go to our measure in the marks, radically cont and then here we have the option of quick
table calculations. And then we're going
to have a list of all different calculations that we can do it on the table. And now we will go with
the percent of total. So let's select that. And now we can see small icon
to the measure, it indicates that this
is a table calculation. So hold control, drag, and drop it on the
filters release. Here, since it's a
continuous field, we have to define it as a
range solistically coke. And now we can see
in the filters two measures for the same field. The first one without
triangle icon, it means it is a measure filter. And the second one
with a triangle icon, It means it is table
calculation filter. What we can do with that? We can offer it to
the users so we can erratic click on
it and show filter. We can see it now as
a quick filter on the right side and the user
can go and use the filter. That's all about the
table calculation filter. All right, so with us we have learned the different
types of filters in Tableau and how the order of the filter in the chain
can affect each other's. All right, so now let's
have a quick summary. We can start with the
extract filter at the top. We can use it only on the
extract connections and we cannot find it in the
Tableau public version, don't worry about it. It is very similar to
the data source filter. And then next we're going to
have the data source filter. In order to create it, we
go to the data source page. Here in our example, we created
two data source filters. The first one is to hide the sensitive informations
of the country USA. And the second one to reduce the overall size
of our datasets. And don't forget that
the data source filter can affect the whole workbook. All worksheets that are connected to this data
source. Then the next, we can create them all in the worksheet page. So
let's go over there. So here you can see very nicely how the
different types of filters are sorted in
the filter shelves. The first one we have
the context filter. The gray pill context filter
can create a subset of data or a timbral table
only for this view. It is something locally
only for this view. But don't forget, do not use context filter in order to hide or protect
sensitive information. Since there is possibility to show the values in the filters. The next three filters,
we usually offer it to the end users in order to slice and dice the
visualizations. So the users could
use it to specify a subset of data to
make focus analysis. Next we have the dimension
filter, like the subcategory. After that we have
the measure filter. And the last one at the chain we have the table
calculation filter. And since those
different types of filters has a logical order, it would be nice as well to have this order on the quick
filters on the right side. So it makes sense to have the dimension
filter at the top. Then we're going to take the measure filter
as the next and the last one going to be the
table calculation filter. All right, so that's all. It could be confusing
at the start. But now after you
understand how Tableau works and the logical
order of the filters, everything then going to make sense in the visualizations. All right, so that we
have learned how to create different types
of filters in Tableau. And next we will
learn how to apply filters to multiple
worksheets in Tableau.
109. Tableau | Customize Filters: All right, so now we're going
to talk about how to apply the same filters in
different worksheets. Because if you are building
like different views, you end up having exactly the
same filters in each view. And it's going to be time
consuming if you are going in each worksheets and adding
exactly the same filters. So instead of that, we can share the same filters to be applied
in different worksheets. And in Tab we have four
different options. In order to do that, we can find those options
in the filters, so it doesn't matter
which one you can pick. Let's go with the
context filter, for example, Radically connects. And here we have the option
of apply to worksheets. And here you can see the
four options as a default. Tablo going to leave it
as only this worksheet. This means locally
only for this view. Here we can see
other options like all using related data sources, all using this data source
and selected worksheets. Before we try those
options first, let's understand
those four options. All right, so now we're going to have a very simple example in order to understand
how to apply filters. We have two data sources, DS one and DS two, and we have different
worksheets that are connected to
those data sources. We have the sheet one connected only to the data source one, and the sheet two
connected to both DS one and DS two
using data blending. And the sheet three only
connected to DS two. Now let's say that we are at the sheet one and there
we created a filter. So now let's learn how to apply this filter in different
worksheets using those Sods. All right, the first
option we have only these worksheets does mean this filter going to be only locally available
for the sheet one. We will not find it in the sheet two or in the sheet three. This option is as well
a default in Tableau. Each time you are creating
a new filter in Tableau, it's going to be using
this option only. This worksheet going to be only available in the worksheet
where we have created. The next option we
have in Tableau all using this data source. For example, the sheet
one is using the DS one. That means the filter
can be applied in all worksheets that are connected
to the data source one. In this example, we have
the sheet one because it's connected to DS one
and as well the sheet two, which is connected as well
to the data source one. But the sheet three is not connected to the
data source one, it's only connected to the two. That means this filter will not be found in the sheet three. That means we are sharing
now the filter in all worksheets that are
using the same data source. Let's move to the next one. We have all using
related data sources. If you are going to
use this option, you're going to find your filter almost in all worksheets
in your workbook. So we're going to find this
filter in the sheet one, we're going to find
it in the sheet two and as well in
the sheet three. That means if you are
using this option, we are automatically spreading our filter in almost
all worksheets. Let's go to the last one, and it's interesting one,
selected worksheets. This means we can
go and manually selecting which worksheets
can include my filter. For example, I could say, I want to see my filter
in the sheet one and as well in the set
three without any rule. As you can see, we have here more control where our
filter can be applied. The last two, all using the data source or all
using related data source. There is like a rule, and Tableau can go and automatically spreads our filters in the
worksheets in my projects. I tend to use
selected worksheets more often than the other ones, because I would like
to have control where my filters should be appear,
in which worksheets. That's all about the concept
of those four options. Now let's go back
to Tableau and try those options pack our filters. We're going to go
to the category, we're going to stay with
the context filter, radically connects and go to the applied to
the worksheets. And you can see the
selected option here is only these worksheets. This one is a default with that, it means this context filter is going to be found only
in these reports. If we go to the other
reports, we will not find it. In order to change
that, we're going to go again to the context
filter radically con, let's try now all using
this data source. Let's click on it now. If you take a look
at our filter, we can find a small icon that
indicates this filter is used in different worksheets that are using the
same data source. In this view, we are using
the big data source. As you can see, we have it
as primary data source. Any worksheet, any view is
using this data source. This filter can
be applied on it. Let's go to the different
views over here. So we're going to
switch to this one. You can see we have
the context filter and as well the first one, since both of them are using the big data source
and the filter going to be applied
automatically on it. But now let's create
a new view where we are using different
data source. Let's switch to the
small data source. Let's take anything. Let's
take the first name. As you can see, the filter
going to stay empty because the big data source
is not used in this view. But now let's go and use the big data source and see
what table going to do. Let's remove the first name, switch back to the big data
source and take as well, let's take the last name. As I'm dropping in
this view, this data, you can see table automatically
going to bring me the context filter because it must be used in
all worksheets. That is, using the
big data source. Which is really
useful if we have different worksheets using the same, for example, context. Instead of creating the same
filter over and over again, we can create it in one
worksheet and then spread it to all sheets that are
using the same data source. Okay, that's all
for this option. Let's go back to our context filter and
try something else. Let's switch to apply to all
using related data sources. Let's try this one.
Click on that, now you can see that we got
a new icon from Tableau. Indicates that this
filter going to be applied to all worksheets
with related data source. Now let's go and check
what can happen to the other sheets
using this option. We're going to find
now this filter almost everywhere in
the first sheets, you can see we are
using the same data. It's going to be
like this. We have the context filter
applied to the view. In the second sheet,
we're going to see again the same context because we are using the same data source. Let's go now and create
a new sheets where we're going to use the
small data source. We are using different
data source. Click on that and let's take, for example, the first
name to the view. Now as we can see
in the filters, we have our context filter. Even though that we are
using different data source, we are not using the
big data source. But Tableau brings this filter here because we are
using this option. But as you can see, it's red. What is going on over
here on the filter, If you mouse over it, it says, data sources that contain
logical tables cannot be used as a secondary data
source for data blending. Since these filters comes
from other data source, from the big data source, Tableau has to make a data blending between them
in order to connect it. It will not work if you have
in the secondary data source a logical data model as you
know in our big data source. If you switch to this page over here we have a data model. We have a logical model where we connected the customers
with the orders and so on. Tableau don't like it as a secondary data source
to has a data model, it will not work but if you have only one table or if you have like multiple joints
at the physical layer, this going to be working. If you go back again, it's
going to stay red as long as the secondary data source
has a logical data model. But if you have one table,
everything going to be fine. You will not get this error. All right, with this option, as you can see
whether you are using the same data source or
different data source, our filter going to appear. Now let's go and check
the last option. Let's go back to
our view over here. Go to the Context
filter at click on it, Apply to worksheets. And now we're going to go to the selected worksheets.
Let's click on that. All right, now we have a very
simple table where we have a list of all worksheets
and as well descriptions, the data sources,
and some details. Now we can go and
manually select which worksheets can
include our filter. As you can see, we have
like everything is selected because we use the option
of related data sources. I don't want that. I'm going to select everything and
start from the scratch. I would like my filter to be the first one. The second one. And this one is like grade out because we are currently
in the worksheets. It's anyway selected.
And other ones, I'm going to leave it de
selected. That's all. Let's go and select Ok. Now, if you check the filter again, we can find a new icon that
indicates this filter now is used in different worksheets that we manually selected. Let's visit the first report. We can find our context filter. The second one the same, the third one anyway,
because we have here created this
context filter. But now if you go to the
different worksheets, you will not find
this context filter. As I said earlier, I use this option a lot
in my projects to have control in which worksheets I want
to see my filters. Generally speaking,
those options are really great way to
share your filters in different worksheets and
solve the problem of having creating the same filters
over and over again. All right guys, so now
we're going to talk about how to customize
our quick filters. But first, let's
understand quick filters. Any filter that you are
presenting in the view, in the visualizations for the end user to
interact with the view. Considered to be
a quick filters. For example, all
those filters on the right sides in the
view are quick filters. We have the subcategory, the sum of the profits. Those stuff are quick filters. The users can go and start
selecting the values inside those quick filters to interact with
the visualizations. Now in order to customize
those quick filters, we're going to go over here in this small arrow
and click on it. Here we will get a long
list of many options on how to customize our quick filter, and
they are as well. In two groups. The
first group is about how to customize
the quick filter. The next set of options is about the filter modes
then we have here, and many options about which
values can be presented. In the quick filter, we have only relevant values, all values in context, all values in database. Now we're going to go and focus on this groups of options, but first we have to understand
the concepts behind them. All right, as we learned before, we have a data source
and worksheet. Inside the worksheet, we
have a context filter and visualizations the
data going to be sent from the data source
to the context filter. The indivisualization
going to be querying the context data and the result going to be sent
back to the visualization. Now inside the view, we can create a filter. Now the question is,
which data going to be presented
inside this filter? Here we have many options. The first one is we're going to get the values
from the database, all values in database. With that, the
values going to be queried directly from
the data source. With that, we are skipping
anything inside the worksheet. We are skipping the data in the context filter and as
well in the visualizations. Does this matter what we are
doing in the worksheets? The values can come directly
from the data source. All right, this is
for the first option. When we say database, it means the data
source informations. The next option, we have
all values in the context. This time, the values
in the filter going to come directly from
the context filter. As we learned before, the
Context filter can generate a Tumberal view or Timbal
data inside the worksheets. Here the values going
to come directly from the context filter
and anything that can be done inside the view will be not considered in the
values in the filter. With that, we are skipping
the visualization level. We are getting the
data directly from the context filter and
not from the data. All right, so that's
all for this option. The next one going to be
only relevant values. The values for the
filter now can come directly from the view,
from the visualizations. That means any interaction
that we are doing in the view, any filtering can affect directly the values that are
presented in our filter. As you can see, those
options are really helpful. And Tableau gives us
now the control in which data can be presented
in our quick filters. Because as you can
see in Tableau, we have different layers
and different stages, and the subsets and the size of the data can be different
from one to another. Normally the size of the data in the data source way bigger
than the context filter. With that you are
defining and you are controlling which data are going to be presented
in my filter. All right, now back to overview. Now, in order to
practice those options, what I'm going to
do, we're going to bring new quick
filters to overview. Let's take the country
rat, click on it, show filter, and we're going
to get as well, the city. Let's go over there. We can
change the order over here. So we're going to bring
first the country, then the city and
the subcategory. I'm going to remove those
measures from the filters. So let's just remove them. And with that, we
have those filters. Now we're going to go and
check which options do we have inside the quick filter
city. Go to the arrow. As you can see, the current
value is all values in the hierarchy and that's because the city is part of the
location hierarchy. But now we're going to
go and change it to only relevant values.
Let's go and do that. Now. If you take a look to
the values inside the cities, we can find almost all the
values from the data source. So nothing changed yet. But as we start now
interacting with our views, the values in the city start
reacting to our selections. For example, let's go
to the country over here and start removing
some countries. We're going to deselect
France, Germany, USA. As you can see,
the values inside the city acting to
our selections. It's like those
two quick filters are connected to each other's. This is exactly
what the option of only relevant values does
to our quick filter. This is exactly the
purpose of this option. Only relevant values, anything that we are
doing in the view, the values inside this
quick filter can be refreshed and updated with
the current selection. Now of course, if we go and deselect Italy, what's
going to happen? The filter city going to be completely empty like our view. It is reacting to
our interaction. Now we're going to go and
change it to another option. Let's go over here on the arrow. And now we're going to change it exactly to the opposites, all values in the database. Let's click that. Now
what's going to happen? Tableau going to go to
the data source and bring all the information about the city and put
it on the filter, regardless what we
have selected in the view or whether we have
a context filter and so on. So now we have a list
of all values in the city that is available
in our data source. It will not be
refreshed or updated if we are clicking around or
interacting with our view. For example, if I'm adding any other cities or I'm
changing any other filters. For example, I'm removing
all the subcategories. You can see it's static, nothing going to be changed
in the city because Go to the data source, get
all the data from there. And that's, this is really nice in order to optimize
the performance Tableau and reduce the resources that are used in
those quick filters. Now let's go and
check something else. We're going to go and select all values in the context.
Let's click on that. That means the values inside the cities is responding
only to the context filter. Since our context filter
is based on the category, we have to bring it to the view in order to change the values. Let's go to the
category radical, click on it and show filter. Now we have our context
filter on the right side. All other filters are
dimensional filters. Now the values from the city can interact only with the category, not with the country and the subcategory.
Now let's try that. For example, if I
go to the country, I remove all the values. You can see the
values in the view did disappear because we
are not selecting any data, but the values in the
city still are there. Let's go and select everything. The same for the subcategory. If I remove everything
from the subcategory, you see the city
is not reacting. It's still static because it comes from the
context filter. Now let's bring everything back. But now if I go to the category, to our context filter, and let's remove
office supplies. Once I remove it,
you can see now the city is reacting
to our view. So we don't have any
values because we are not selecting anything
from the category. Here you can see there is like connection only to
the context filter, but not to the other filter. This is exactly what
can happen if you make the city depending to
the context filter. All right, with that,
we have learned the three main
options in order to control which values
is going to be presented in our quick filters. But as we started with the city, we saw that there
is another option called all values
in the hierarchy. It was the default one,
let's go and select that. Once we do it, what we
are doing now we are connecting dimensions that
are in the same hierarchy. If you check our data Bain, we have hierarchy that
we created previously. It is the location hierarchy, and inside it we have
four dimensions. We have the continent, country, city, postal codes. Now, all those four dimensions, if we use it as quick filter, they're going to be
connected to each other's. Let's check the
example. Now we have the city and the country
in the same hierarchy, and they are connected to
each other in the category. It's our context filter, it's empty, but still the
city is showing values. That means the city
now is disconnected from the context filter
or from any other filter, not in the same hierarchy. If I go and select any
values in the category, you see nothing is
changing in the city. Even if I remove everything, but the city can react once and start deselecting or selecting values from the same hierarchy. If I remove France, Germany, USA, you can see now we have
only the cities from Italy. They are like connected
to each other. But here we have something
special about the hierarchies, since as we learned, we
have dimensions levels. The country is higher
level than the city. The lower level dimensions will not affect the
higher level dimensions. Only a higher level dimension
can affect the lower one. What I mean with that,
Let's go to the country. Select all the values. As you can see, now we have here in the cities, all the values. But if I start selecting
any values from here, you can see the country is not reacting for it because
it's higher dimension. Even if I go and
deselect everything, I still have the four countries. That means since the city is lower level than the country, it will not affect the country. But if we bring now a higher level than the country
which is the continent, let's see what's
going to happen. We're going to go
to the continent, radically connect
and show filter, I'm just going to
bring it over here now as I start deselecting
stuff in the continent, as you can see, the values in the country are affected
with my selection. Because of the hierarchy, the content is higher
level than the country. With that, as you can see, this is what can happen if we have all values
in the hierarchy. You have to pay attention to the levels of the dimensions, and those dimensions is going to be connected to each other. With that, we have covered all those options that we could use in order to control the values
inside our quick filters. Okay, so now we're going to talk about a different group of options we could use in order to customize
our quick filters. We have the filter modes, we have single value list, single value, dropdown slider, custom list, and so on. In order to learn that,
we're going to have the following example
what we're going to do. We're going to go and
clean up our filters. I'm going to remove the country, the city and the continent. And we're going to have the
subcategory and category. And we're going to bring as well the product name as a filter. Right click on it and
let's go show filter. Now we have the quick filters. On the right side, we
have the product name. I'm just going to
bring it over here so it looked like our hierarchy. It started with the category, subcategory, and product name. Let's show all the
values over here. And for the product name. I'm going to change the modes
to drop down or a list. All right, so now
let's start with the first quick filter the
category and try those modes. We're going to go to the
arrow, and as you can see, as a default it is
multiple values list. As you can see, we have
the list again here as a single value we
have the same option, one a single value and
other is as multiple value. The same goes for dropdown. We have dropped
down single value and drop down as
multiple values. Let's try those stuff out. We're going to go to
the single value list. And as you can see now
the visual of the filter, the change to radiobuttonsow, as I'm selecting those values inside the category,
as you can see, we only one value, as the name says, it's
only single value list. So that means we are making
some kind of restrictions. Only one value is allowed. But if you want to have
multiple values as a list, we're going to go and change it back to multiple values list. Here of course, you can
choose different values and different categories
without any restrictions. This is about the modes list, single value or dropdown list. Okay, Now let's go and
try another modes. We're going to take this
time single value, dropdown. Let's switch to this one. And as you can see
with the dropdown, you will not find all the
values immediately in the view. You have to click on the
dropdown menu over here. And then you can select
the values, single value. Again, here we can
select only one value. We cannot select
multiple values. I can select one
category at a time. And as you can see,
it is working. Let's switch now to
multiple values. Drop down. We're going to have, again,
here, the same thing. We have a drop down menu. But inside the menu we can
select multiple values. That's it for the drop down. All right, so now let's move
to another filter mode. We have the single value slider. Let's select that. And with
that you can have a slider. We can move it to left and right to have
different values, but it is not really interesting for a dimension
with string values. We can use it for
numeric or dates. Because this is not really nice to have a
slider for values, it's better to use the drop down or a list for
string values. So that is for the sliders. I rarely use it in the projects. So now let's move
on to another one. We have the custom list, but I will not use
it in the category. Let's go for the
product name and use a custom list.
Click on that. Now as you can see
now the product name don't have any values.
We cannot see anything. We have only a search box. So now we can
search for a value. Like for example,
let's search for Apple. And then hit Enter. You can see now a list of all products that
contains the name Apple. So it's like searching
inside this field. So if you can go
over here and start selecting the values that you
want to be in the filter. As I'm clicking over
here on those boxes, I'm going to see a list of all
values that I'm selecting. With that, we have created our
list using the search box, but here we are not
seeing any data because of the categories. So I'm just going to
switch it back from the slider to
multiple values list. I'm going to select everything. And now we can see that we are selecting only the
subcategory phones, because we selected
over here, the Apple. With this type of list,
the customers can go and select their own list. So we can go and add more
stuff like Samsung over here. So let's search.
I'm going to add those products as
well to the list. And with that, we are bending or adding more products
to the list. If you want to clear everything, we can go over here
and clear the list. This is really nice way to
search for specific value, especially if you have a lot of values inside
the product name. Now let's go and try the last option that we
have in the filter modes, we have the wild cards.
Let's go and select that. Now we can see
that we have again a search box where we
can enter a value. But now we are searching for specific pattern in our data. In order to show
you how this works. We're going to get the product
name as well in our view. Now we're going to go and search for specific pattern example. I want to search for all product that starts with
the character A. In order to do that,
we're going to go over here after the A. It doesn't matter
which character going to comes after that. That's why we're going to
use the character star. Let's go with that.
And then hit Enter. We can see at the product name, Tableau did filter the data depending on our pattern,
our search pattern. We can see over here
all the products that starts with
the character A. Let's go and have
another example. Let's say we want
to start with PP, then it doesn't matter which character going to follow up, we're going to have
the star. Let's enter. We have here only four products
that follow this pattern, and it is the word of or. We can search for
the last characters. Let's say that it
should end with, instead of having the
start at the end, we're going to have
the star at the start. We have star then
then let's hit Enter. All those products end
with the character. If I just like
move it over here, some of them are
really long names, you can see for example
here book cases. It ends with all those products
ends with the character. This is how this mode works. The wild cards, we can
use it in order to search for specific
pattern in our data. Again, this is really helpful. If we have a dimension
with a lot of values, we can use the search. To find the specific
data that we need. With that, we have covered all different modes that we have in this category in order to
customize our quick filters. All right, now let's
move to another set of options to customize
our quick filters. In each quick filters we
have a lot of information. For example, we have
this extra bottom called all, or we have a title. Or we can search
for specific value, or we can reset stuff and so on. So we can customize all those
informations in Tableau. Let's go over here again. And then we can go
to the customize. And now we can see all those
options show all values. This is exactly the first
value that we can select. Deactivated. We can have only the values
from the dimension, from the filter, But
sometimes it's really nice. For example, here
in the subcategory, if you are like you want to
deselect a lot of values, you just can go and di, select the all with that, you are removing
all the selections and then you select
specific stuff. With that, we can select
the values really fast. Let's move to the next one. We have this small search icon. As you go over here,
you can search, for example, for Art Enter. Then you're going to
get the value inside this dimension if you want to hide it and nato
it for the users. For some reason you
can go over here and the customize and
then deactivate it. Once you deactivated, you can see the small icon disappeared, but I think it doesn't harm to have it in each quick filter. Let's activate it again. As you can see with
those options, we are customizing
our quick filter. Let's check another option. Let's go to customize. And here it's really
interesting to have the show apply pattern.
Let's select that. And once you do it,
you're going to get two new pattern
cancel and apply. I'm selecting now in my filter, as you can see, nothing
is changing in the view. That means it will
not send any query to the data source or the context
filter to get the data. Nothing is changing as long as I'm not clicking
here on the Apply. Once I click on Apply, the filter going to send query to the Tableau and Table
going to answer with data. This is really nice
if you are going to select a lot of values, each time you are
selecting a value Tableau going to do the calculations,
maybe it makes sense. First, let me select
everything and then do the calculations If you don't activate this option,
like in the category, each time we are selecting
and selecting from the filter Tableau has to to
our interaction with that, we are generating a
lot of calculations in Tableau as we are
clicking around. But over here as we are
selecting the values, nothing is changed until we decide to say,
okay, I'm done. Now go and do the calculations. This is, again,
really nice way to reduce the unnecessary
calculations in Tableau. All right, so what else
we can customize in our quick filters is the title. So we can decide whether you
want to show a title or not, or you can either the
title name itself. If you go over here you say
okay, instead of subcategory, I'm going to have
like minus between them and make everything
small for some reason. Let's click okay. As you can
see now the title change, but the datasets
name didn't change. So if you go to the subcategory, the name stays as it is. We just renamed the filter name. All right, so that
we have covered now almost everything on how to customize our quick
filters in Tableau. Alright, so that we have
learned how to apply filters to multiple
worksheets in Tableau. And next I'm going
to share with you my top tips and tricks that I usually use in my projects once I start using
filters in Tableau.
110. Tableau | 10x Filter Tips & Tricks: Now I'm going to show you
the best practices of Tableau filters that I usually follow in my
projects. Let's go. The first step that
I have for you is to utilize those filters. The extract filter,
data source filter, and the context filter. I saw a lot of projects where developers really
forget about them or ignore them because they are not really important
individualizations, but they are very important for optimizing the
performance in Tableau. My advice here is for
you to always have a discussion with
the end users about promoting one of those
filters that you have indivisualizations to be
first an extract filter. If it cannot be an
extract filter, then the data source filter
and the last option to optimize the performance is to bring it as a context filter. Because sometimes
individualization you really don't need all
the data you don't need. Like for example, ten years
of data indivisualizations. Try to discuss it with
the users to say, maybe let's bring only two years of data to the visualizations. And then you can utilize an extract filter or data source filter on your work work. Which can has a great impact on the performance
overall in Tableau. Don't forget or ignore
those three filters. The second filter tip
that I have for you is about optimizing the
performance in Tableau. Which is avoid using only relevant values
in your quick filters. For example, if we go to
the subcategory over here, we can see that it is currently set to only relevant values. If you use this option for all your quick filters,
what can happen? The performance in Tableau
going to be really bad and everything going
to be really slow. So we can go and switch
it to something else like all values in
database or in context. We can go and switch that. With that, you're
going to reduce the stress on the memory and
the resources in Tableau, but let's understand why. All right, so now
let's understand what can happen in Tableau If you're using your filters all values
in database or in context. It's the same once the viewers or their
users start the reports. If you're going to send only one query to the data source, and the data source is going to answer with the results back. So that means we're
going to have only one initial query as
the user starts the view. But on the other hand,
if you are using only relevant values,
what can happen? The view going to keep sending
queries after query to the data source always to get an update and
refresh in the view. That means the view
going to keep sending multiple queries for
each user interactions, which can really impact the
performance in Tableau. Because each time the user is clicking something or
interacting with the view, the view going to keep
sending queries to the data source to get an
update about the interaction. Which can use a lot of resources
and memory in Tableau. And going to slow
everything down because each user is clicking
something in the view or, and interacting, the view going to keep sending queries to the data source which consumes a lot of memory and
resources from Tableau. And it's going to
slow everything down. Be careful with
your quick filters, if you're having everything
on only relevant values, things might be slow. If the users are suffering from bad performance in Tableau, maybe think about switching
all those filters to all values in context
or in the database. I have another filter tip about optimizing the
performance in Tableau, which is avoid using dimensions with high
cardonality as quick filters, those dimensions might impact
the performance in Tableau. But first, let's understand
what is cardinality? Cady is the number of
distinct values in the field. For example, in our database
we have the customer ID. We have around 800 customer ID and we have a lot
of products names, those two fields considered to be a high cardinality
dimensions. On the other hand, we
have another dimensions, for example, the category. We have only three values or the countries in our database, we have only four countries, The subcategory as well. We have only 17 subcategories, those dimensions
considered to be it. And if you are using them, the performance
going to be okay. But if you are start using those dimensions
with high cadalty, the performance might be pads. The best practice here is to avoid using high cardinality. All right, back to
our quick filters. In our view, as you can see the category and
the subcategory, there are dimensions
with low cadality. It's fine to leave
it at the view, but the product name, it has a lot of values. It is dimensions
with high cadality. It's really worth to
discuss it with the users whether they really need
such a filter in the view. If you find out no one needs it, just remove it
from the view just to have a good
performance at Tableau. Now let's move to
the next filter tip. Is that, let's say that
the users really want to see the product name
or the customer ID, any dimension with
high cardinality. In the view here, the tip is to change
the filter modes. Instead of having a drop
down list or a list, we can use a wild match for dimensions with
high cardinality. Why having a list of all the
products or the customers in the view is a bad thing in Tableau or bad for
the performance. We each time Tableau
has to go to the data source or to
the database and prepare a distinct list of all the customers or all the products to be
presented in the view. Instead of having a list, we could go and change
it to Wildcard match. And as you can see, Tableau
is not preparing anything. So we don't have any values
to be presented in the view, only if the customers start interacting with
the quick filter. Then after that,
Tableau going to go to the database and brings
the relevant values. And with that, we are
avoiding using a lot of resources and unnecessary
calculations in Tableau. If you have a dimensions
with high cardonality, either avoid using it or
if you want to use it, just use wild card match. All right, so let's
move to the next place. Practice in Tableau is as well about optimizing the
performance in Tableau, which is start using the apply patum in
your quick filters. Because if you don't use it, let me show you what
can happen each time. I'm still selecting something. It is like equery sent
to the data source. This is one query, second query, third query,
fourth query, and so on. Each time I'm clicking
on my filters, there will be generated a lot of queries to
the data source, which is consuming a
lot of performance. Instead of having such a filter, we can customize and add
bottom as we learned before, we can go over here, then
customize and show Applypatom. Now as I'm clicking on
those values in the filter, no query is generated
to the data source. We are not using any
resources in Tableau. And once I'm done
selecting what I need, then I'm going to hit Ok
or apply what can happen, one query going to send to the data source to bring
the result to the view. With that, we are
reducing the number of queries that our visualizations
is generating Tableau, which is really great
for the performance. My recommendation here,
if you have a filter like the subcategory or a dimension
with high cardinality, where you are using a
list, use applypaom. Because the users will not
select only one value, they usually select
multiple values and then at the end
they can apply. But a filter like the category, we have only three values, like it doesn't work
to use apply bottom, it's only three so
the user is going to maximum like
generate three queries. It's fine to not use
a bly bottom with the dimensions with
really low cardinality. With high cardinality
or medium cadiality, like a subcategory, go
and use a bly bottom. All right, the next
filter type that we have is as well about
the performance tableau, which is avoid using exclude and always include
if it is possible. So for example, if you go to the subcategory we
have here the option of using include or exclude if you're
using exclude values. Those queries that are
going to be generated in Tableau are more
complex than include. More complex means
more resources and might slow down the report
or the view in Tableau. Avoid using exclude
when it's possible, so I'm going to
switch it back to include which has
better performance. All right, so let's
move to the next one. And I promise you
this is the last one about the performance which is minimize the number of
quick filters in your view. Those quick filters
are going to take not only the space in the view, but also going to generate
a lot of queries. A lot of stress going to bring the whole performance
in Tableau down. Try to avoid using a lot
of quick filters and discuss with the users each
time they need new filters. Whether it's really necessary
to put it in the view, because I saw a lot of Jects
that the users always wants. A lot of filters try
to discuss them. And not always bringing a new quick filter
to the Tableau, because you're going
to end up having really bad performance
in the view, and no one's going to be happy having bad response time
in the visualizations. Try to minimize the number of quick filters in Tableau
that everyone is happy. Now let's bring more
filters to our view. We're going to go, for example, I pick the order date, I'm going to show
it as a filter. Let's take the
location informations, the country as well,
maybe the city. Now we have to start
sorting those informations. I usually start in my projects
with the first filter is the date or the time aspect that we
have in the visualization. Here we have only
the order date. We're going to drag
and drop it on the top because the users can
start thinking which date, which year I want to see
in my visualizations. They're going to
focus always first. On the time and the date aspect. After that, we have two kinds of informations or two hierarchies. In the quick filters, we have here the location informations, We have the city
and the country. Then here below, we
have the informations about the product as well. The, our hierarchy here, we have to not mix
them together. Separate them first,
start with the topic, for example, the location. First we're going to talk about
the city and the country. Then we're going to talk about the product informations
here follow, as will the logical
order in our hierarchy. Our hierarchy
starts, for example, with the country as a
higher level than the city. Start always with
the higher level, then move down to
the lower level. For example, here we should
bring the country and top, and then the city
should be below it. If we take, for example,
the postal code, let's have it as
well in the filter, the postal code should
be below the city. As you can see in
the quick filter, we are rebuilding
the logical order of the levels in the hierarchy. The same goes for the product. We have first the category, the subcategory, then
the product name. Here, everything
is fine with this. Add the user, start
filtering the data, They start from top to down. There is like logical order of the field which
really makes sense. All right, let's move to the
next filter tip that we have to all values in dimensions
with very low cardinality. What I mean with that, for example, let's
check the country. The country has
only four values. And really it makes
no sense to use all because it's only three
values or four values. And the users can go and select those values without now
selecting all or deselecting. All this dimensions is
really low cardinality. And we can go and
remove this option. Let's go to the customized
and remove it with us. We have more space
to show to the users and this option usually
takes a lot of space. All right, so let's
move to the next one, to the city, and let's
check the values. As you can see, we
have a lot of values and here it makes sense
to leave it as it is. We're going to leave
the all values, the postal code as well. It's like relative
high cardonality, we're going to leave
it, the category here. We have only three values. It really makes no sense
to use the old values, so I'm going to go and
remove it as well from here. And with that, we
have now more space. We didn't waste space for that. The subcategory here, let's
make it bigger a little bit. And you can see, yeah, a lot of values and
it makes sense to select all subcategories
or de select. So I'm going to
leave it for that. That means we just change
that for the category and the country is really dimensions with very
low cadonality. All right, so now
we're going to move to the final filter tip that I have for you that I
usually use in my projects, which is as well
about the design as the locum feeling in Tableau. Here, we're going to use
the suitable filter modes in the quick filters. Let's see what I mean with that. First, we're going
to start with the order dates or with
the date that we have. Usually in our view, I
usually tend to use here like a continuous field instead of
a list of distinct values. What I mean with that, I usually go over here on
the year of order, dates radically connect and
convert it to continuous. With that, we can
have a range between two values which can has as
well less space in Tableau. Let's go and switch now. As you might already
notice, the order date, the quick filter did
disappear because we changed the role from
discrete to continuous. Let's go and show it again. As you can see now we
have the quick filter, very minimum and not
taking a lot of space. This is really
nice as a start to have a range between two
values for the date. Let's move to the next
one. We have the country. The country is dimensions
with very low cardinality. And here I tend
always to use a list, multiple values, so
everything is correct. Let's check that it
is multiple values. A list. I'm going to
leave it as it is. The next one, we
have the city here, We have a lot of values here. We can only see like three
values from the whole filter. Doesn't make sense to have
it as multiple value list. Instead of that, I
was going to say this is dimension with
medium cardinality, we're going to always tend
to use a drop down for that. I always keep this single value. It's like restriction,
that has no meaning. We're going to go with
the multiple value drop down with that. As you can see, we
have a minimum space. We have only like one
value that we can see. So if the users want
to select the cities, so the user is going
to go and select the values that the
needs, and then closets. It's really minimum and
don't take a lot of space. The next one, we have
the postal code as well. Here we have the same
situation dimension with a medium cadonality, we have like a lot of values, so we will not
leave it as a list. We can have it as
a drop down nu. So as you can see, the size compared to the
city is really big. Individualization. We're
going to go as well over here and change it to
multiple values. Drop down. The next one is the category. It's exactly the country, only three values,
very low cadonality. We're going to
leave it as it is. I think for the subcategory. You already know that it
has like medium cadonality. We're going to go over here
and make it a drop down. Now we're going to
move to the last one, we already talked about it. The product name is huge
and has a lot of values. The best practices here is to use a wild card match
for this value. For example, let's
take another one. Let's take the first names. I'm going to show
the filter over here and we're going
to bring it just down. The last one penis. The product name as
well is a huge filter. It has a lot of values here dimension with
high caderality. We're going to go
and switch the modes to wild card match exactly
like the product name. So as you can see, we have
now a lot of filters, which is not really good
for the performance. But we saved a lot of spaces as we change
the filter modes. So with that we have really nice quick filters on the right side, not taking a lot of spaces. So with that, I covered
all the tips and tricks, or best practices
that I usually use in Tableau projects if I'm
using filters. All right. So with that, you know the
best practices that I usually follow once I start creating
filters in Tableau. And next we will learn
the different ways on how to sort our
data in Tableau.
111. Tableau | Sorting Data: All right, now we're
going to learn how to sort your
data inside Tableau. A lot of people think that sorting data in Tableau
is not working correctly, which is not really right. So we're going to remove
now this confusion and we can understand how
sorting in Tableau works. Let's go, okay, now let's understand what
is sort. It's very simple. Sorting is arranging your
data in a specific order. And here we have two options. Either we can assort it
using the ascending order. Here we can arrange your
data in increasing order. That means we're going to
start with the lowest, and as we are moving down, we're going to have
the highest value. For example, let's
take the order ID. We can sort it using
the ascending order. Then the values
can be like this, 123456, the values are
increasing as we are going down. Or if we have like, for example, the first name, we
have characters. It's going to be
sorted from A to Z. For example, we have
here and Dwight, and end up with Pm. The second option is to sort your data using the
descending order. Here we can arrange your
data in decreasing order. That means we always start
with the large value. As we are moving
down, we're going to go to the lowest value. For example, again
here the order ID, We start with the highest value. In this example, it's
going to be the 654. As I'm moving down, I'm going
to get the lowest value. The same for the first name. It's going to be the opposite
of alphabilitical order. We're going to start
with Pam, Michael James, until we end up with, and as you can see,
it's very simple. We have only two options, either sorting the data using the ascending order or
the descending order. Now let's go in Tableau and understand how
we can do that. All right, now let's create another view from the scratch. We're going to stay with
the big, so let's take, as usual, the
subcategory in the rows. And we're going to take,
as a measure, the sales. Let's put it in the columns. Let's show the numbers. I'm going to take
it to the labels and as well to the colors. Then we can have as well in
the columns, the country. Let's go to the customers. Inside the hierarchy location, we have our country and
let's put it over here. Okay, this is our view for now. There is two ways on
how to in Tableau, either directly in the
visualizations and we call it quick sorts or we can do it as we are building
the view as developers. We're going to start the first one where we can learn how to do sorting using quick sort
from the visualizations. This is what usually the
users going to see and do. All right, now for
quick sort in Tableau, there are three places
where you can sort your data directly in
the visualizations. The first one is
sorting the data from the header is you mouse
hover on the header name. Over here you can see
that we have like small icon in order
to sort your data. We can use it here to sort
the header informations. Or the second place we can
go to the axis over here. And you can see as
well there is like small icon to sort the data. The third on the last one, if you go to the field labels, if you go to any values
here inside the header, you can see we have a small
icon to sort the data. Those are the three places
where you can sort the data. In Tableau sorting work
with three clicks. The first click going
to sort the data, ascending the second one
going sort the data, descending the third
click going to bring the data as it is
sorted from the data source. All right, as a default, the data going to be
sorted as the data source. If your data source
is sorting ascending, we can have the same
way at the view. Now as a default, we are not enforcing any
sorting in our view, but we are taking it
from the data source. As you can see, it is
sorted already in ascending fission because we have
from the data source. Now if you go to the
header, for example, let's click on this icon
and see what can happen. As you can see,
nothing happened in the view because it's exactly
like the data source. We have it in ascending fission. That's was the first
click that we done. We sorted now the data
in ascending way. You can see over here we have
a small icon that indicates this dimension is now sorted in the view
in ascending way. Let's go again over
here and click again. Let's see what's going to
happen if I click on it. Now the data going to be sorted in descending
order as well. Here we're going to
have different icon. We have the tables and then
it ends with the accessories. Now we have it descending. Now to go and reset
everything back to the dealt, to the data source models. What we're going to do, we're going to click the third time. If I click again over here, the icon is going to be
gone from the dimension and the data going to be sorted
exactly like the data source. This is how sorting
Tableau works. You have three click,
the first one ascending, the second one descending, and the last one, we're going
to bring it to the default. Data source. All right, now we're going to go
to the second place where we can sort our
data in the view, and that is the axis. If you go to the axis over here, we can find the small icon
here is exactly the opposite. The first click can assort
the data in descending order. The second click can assort
the data in ascending order. And the third one going
to bring it back to the default like
now, let's try that. We're going to click
the first one, as you can see now the data and the rows are sorted
in descending order. We start with the highest sales. As we are moving down, we're going to move to the
lowest sales. All right. Now let's click the second one. Let's go, we are now sorting
the data in ascending order. So we start with
the lowest sells and we end up with
the highest seals. And the third click can bring it to default without any order. Let's click on that and
we are back to the start, where the data is
not sorted at all. So as you can see with
the header and the axis, we are sorting the rows only
only the rows are sorted. We are not sorting the columns. France, Germany, Italy, USA can stay at
the same position. We are not sorting the columns. Now, in order to
sort the columns, we're going to go
to the third place, to the field label. We're going to go to
any of those values, doesn't matter which one
we're going to click. For example, on
the chair, you can see this small icon here. Again the same as axis. The first one going to sort the columns in descending order, the second one ascending, and the third one
to the default. Like now, let's go and click
over here on this icon. Now the data is sorted
in descending order. That means the first column going to has the highest sales, then the next one going
to has the lower. And as we are moving
to the right, we're going to get
the lowest value. We are sorting the columns in descending order,
as you can see. As well on the columns,
we have this icon over here indicate that the
columns are sorted. Now in the view. Now if
we go and click it again, we're going to sort
it in ascending way, where we can start with the lowest value, the first column. As we are moving to the right, we're going to have the last one with the highest value as well. Here we can see the icon which the data is sorted
in ascending way. The last click as you know, we're going to go
back to the default, the data is not sorted at all. All right, that's all about
quick sorts in Tableau, it's really simple once you
understand the places to sort the data and how you can click around to sort the data
in different ways, a lot of people get
confused about it. But it's really simple. Let's say that we
have the following scenario where you
say, you know what, I don't want to offer the users this possibility
to sort the data. I'm going to sort
everything in the view and the user is going to just
see the report as I. All right, so now in order to disable the sorting
option for the users, we're going to go
to the main menu. And then we're going to
go to the worksheets. And then here we have show sort control as a default tablet going to enable it, which
makes really sense. Now let's go and disable it
and see what can happen. Now if you go to
the visualizations, you will see that we
don't have anymore the icons in order
to sort the data. If I go to the sales
over here or I go to the subcategory
or anywhere you see, we don't have any options
in order to sort the data. This possibility is going to be completely disappear
for the users. With that, we have removed
completely the options for the users to sort the data
inside the visualizations. To be honest, I've never
been in situation where I have to remove this
option for the users. It really makes
everything static. And this is exactly the
opposite of what we want. We want to make always
our dashboards and reports dynamic
interactive for the users. I think it's always
really bad to make only static reports without
having any dynamic inside it. Unless maybe the users
exactly ask for this to say, okay, I don't want
to sort the data, make it static as
much as you can. You can go and
disable this option. For now, I'm going to
go to the worksheets. I'm just going to go
and show set control and enable it again as we
go again to the sales. You can see we got
again those small icons in order to sort. All right, y. That's all about how to sort the data directly
from the views, from the user's point of view. All right, so now
we're going to move to the second group
where we're going to learn how to sort the data as
you are building the view. In order to do that,
there's two ways to do it, either from the tool bar or
from the dimension itself. Now if you move to the tool bar, we have here two options, ascending and sort Descending. Now in order to sort
those dimensions, you can click on the country, for example, now we are
sorting the columns. And then click over
here, Ascending. As you can see,
now we are sorting the data in ascending
way for the columns. If you want to sort the
subcategory, the roles, we can click over
here and then click on ascending or descending. As you might already notice, we are sorting the data always by the measure,
by the sales. If you most over on it, it's going to say sort subcategory descending
by the sales. We don't have any option here to sort the data by the header. It's only sorted by measures. All right, that it's about how to sort the data
from the toolbar. The second methode is to sort the data directly
in the dimension. Let's go, for example, to the subcategory, right click on it. And as you can see, we have
here two options about sort. We have clear, sort and sort. Clear Sort, going to reset
everything to the default. Let's go and do that to
start from the scratch, so I'm just going to
clear everything for the subcategory and then
right click on it. And let's go to sort. With that, we're going
to get a new window. Says we are sorting now
the dimension subcategory. I will just move it to
the left side in order to see how table going to
react to my selection. Okay, what do we have?
Over here is two sections. The first one is about how to sort the data,
the sort methods. The second one is about the sort order, ascending
and descending. Let's see, which
options do we have? We have five options. The data source order, alphabetic filled
manual, and instead, let's start with the
first one, the data order. Here we have
it as ascending. We are sorting the values
inside our header, the subcategory in ascending
way, in alphabetical order. We can reverse it by going
to the descending order. As you can see,
the values switch. Now if we want to go
and reset everything, we can go over here and click Clear to go to the
default settings. That's it for the
data source order. Let's move to the next one. We're going to have exactly the same effect because
we have it as well at the alphabetical
order. Let's go over here. As you can see, nothing going to change because we have
it at descending. Let's go in
alphabetical order to the ascending and
the Hedron switch. Exactly the same effect. All right, now let's
move to the third one. We're going to go to the field. We can go and sort the
data by any field, from the whole data source. The field doesn't have
even to be on the view, but of course, it makes
no sense to do that. As a default,
Tableau is selecting the sales because it's
only measure that we have. In the view, it makes sense and the data is sorted
in ascending way. But if you want, you can go and sort the data by the number of customers inside each
category or subcategory. We can go over here and select the customer ID and
the function can be counts the total number of customers inside
each category. Now those categories are
sorted in ascending way, depending or based on the
total number of customers. We have this ability to sort the data by any field
from the data source. But it doesn't make
sense of course, to sort the data
like this because it's going to confuse
the customers and they will not understand why
those categories are sorted like this without having like a description
in their report. That's all for this
method, sort pi field. Let's move to the next one. We have sort pi
manual and here you have the freedom to make
the order of the dimension. For example, we can take
these machines over here. As I'm moving it down, you can see the order in the
view is changing as well. I can go and sort the
dimension as I want. It's really simple here. We don't have any rules, we don't have ascending
or descending. We have the complete freedom to sort the values
inside any dimension. That's it for this option, let's move to the next one. And the last one,
we have the nested. Now, in order to understand how the nested sort
works in Tableau, we have to work with
multiple dimensions. The best way is
to get hierarchy. Now, let's go and
create another view. I'm just going to go and
close this one here. Let's, let's take
the continent to the rose and let's take the profits to the
columns as well. As usual, we're going to
show the labels of our data. Now if you go to the
continent over here and radically connect,
let's go to the sort. Let's say we're going
to sort the data by the data source descending. As you can see, we
are now sorting only the continent. If we
drill down to the country, you can see that only
the continent is sorted, but the country is not sorted. So if you go to the city, you can see that city
is as well not sorted on the first
dimension, is sorted. But now instead of
that, we can go and use the nested sort in order to sort all dimensions inside the hierarchy
automatically. Let's go and remove those stuff. So I'm just going to drill
back to the continent, or we call it drill
up, right click. Let's go to Sort. Then we're going to go to the nested.
Now we're going to say, okay, ascending. And we're
going to use the measure, the aggregation sum of profit, in order to sort the data. Now let's go and close it. And with that, we
got the nested sort. As you can see, the
continent is sorted. But now, if I drill
down to the country, let's see the country going
to be as well sorted. Now if you look
closely to the data, you can see that the USA is the only country
inside this continent. So we cannot see any
sort of over here. But you can see that the
countries in Europe are sorted, ascending it's start with
the lowest value from Italy, then France, then Germany. You can see the country
inside this continent is sorted as well based
on the nested sorts. As you can see, the countries of each continent going to be sort separately
from the countries from the other continents. This is how the
nested sort works. Let's go and just put the
profit on the colors as well. Now let's go down in the hierarchy and drill
down to the city. We're going to have more
data and it's going to be more clear as you can see. Now the city is as
well sorted and now we are sorting the
cities in one country. For example, over here in USA, the lowest sales is in, and the highest sale
is in Portland. We are sorting the cities
based on the country. So this is one section. The next section is Italy. The next one is Germany. So each country is going to be sorted separately
from other country. With that, we have learned
this method work if we have multiple dimensions
and it's going to work perfectly
if we have arch, in our view everything going
to make sense and the sort going to be very logical for the users as I'm drilling down, for example, to the Bostl code or I'm rolling up
back in my view, everything going to be
sorted in very logical way. All right guys. So with that
we have covered everything, how to sort the data inside our views from the
user's perspective, how to sort the data as we
are building the views. And I think it's really simple
and not that complicated. All right, so that's
all about how to sort our data in Tableau. And we have completed
this section. In the next section, we're
going to learn about Tableau parameters to add
dynamics to our visualizations.
112. Tableau | Section: Parameters: All right everyone.
So now we're going to talk about the parameters. Parameters are game changer in Tableau and that's because
this is my opinion. Parameters are the best feature that Tableau did introduce. Because parameters
in Tableau can make your visualizations very
dynamic, interactive, and flexible in very unique way that you cannot find
it in any other tool. All right, so now
what are parameters? Parameters are like variables in programming languages that allows the user to replace a constant value in
the calculations, filters, a reference
line, and so on. Okay, so now what
this really means, if you are building a
view for your users, you are already making
a lot of decisions. Defining a lot of values
that can stay static, and the users are allowed
only to read your views. So for example, you might create the following
calculation in Tableau where you are defining
a threshold for your KPI. So you are saying if the
total sales is less than 400, then the KBI gonna show red. Otherwise it's going
to show green. Here, the value of
the threshold 400 is static and cannot be
changed from the users. The viewers only can be
changed from the developer. But now you might be in a
situation where you have two requirements from
two different users, where they define
different thresholds. So here you end up making
two calculations for two customers and as
well creating two views. But now instead of doing that, we can use the power
of parameters. So here we can replace the
value 400 with a parameter, and then we can offer
the parameter as an input field for the
users in the view. And now the users can use the parameter to define
the needed value as it requires using parameter
going to change the behavior of your view depending on
the value of the parameter. This going to make
your views are dynamic and ready for
any requirements. And there are endless ways to
use parameters in Tableau. And in this tutorial,
I'm going to show you six
different use cases. The first use case is about how to use parameters
and calculations. The second use case is
about the reference lines, the third one how to
use them in filter. And we have another
very special use case in how to switch between
dimensions and switch between measures in
very dynamic way in one view and another use case
about the titles and text. And the last use case, how
to use parameters in pens. All right guys, so that was
a quick intro to parameters. Next we will learn how to create dynamic calculations
using parameters.
113. Tableau | Dynamic Calculations using Parameters: All right guys, so now let's start with the first use case, how to use parameters
in calculations. So now let's create
now some kind of KBI to track the profits
by the subcategory. Okay, so now we're
going to stay with the big data source and we're going to go to the products
to get the subcategory. And then we need
the major profits. So we're going to
go to the orders and we're going to get
the profits over here. Okay, So now we're
going to show as well the labels on the view. And now we can have
a threshold or BI, where we're going to
say if the profit is less than ten K, then
it's going to be red. Anything higher than ten
K, it's going to be green. Now in order to create the logic and the colors in the view, we have to create calculations. Don't worry about how to create
calculations in Tableau, because we're going to have a
dedicated section for that. Now in order to create
the calculation, we're going to go
to the data pane radically on the empty space, and then choose Create Calculated
field. Let's go there. And now we're going to
call it QBI colors. Now then we're
going to write here the expression about our logic. It says if we need some and
then we have the profits. We said if it is less than
1,000 K, it can be red. So we're going to
write the value red, otherwise it's
going to be green. Let's end it with that. We have our logic for
the colors in our view, and as you can see over
here in our calculations, we have a constant.
It is the ten k. Let's go and create that. So
we're going to click okay. And here on the left side
you can see our dimension. We're going to take it
and put it on the colors. Now let's go inside and assign the values for the colors green. It gonna be green and red. It's going to be a
red. Let's click okay. Now we can go and
give this report to the users and they can view
it and interact with it. But now as you can see,
the calculations of the KPI is really static and
they cannot customize it. In order now to give to the users option of defining what is red
and what is green, we have to use parameters. Now, in order to
create parameters in Tableau, there is
two ways to do that. Either you go to the data pane and create your parameters, or you created in the
place where you need it. For example, if you
are creating a filter, inside of the creation
of the filter, we can create parameters. Now let's see first how we can create parameters
in the data pane. In the data panes, there's two
ways to create parameters. Either you go to the empty
space, Tic, click on it, then you can see here create parameter or the other option is that you go to the head of the data pane and you
have here small arrow. If you click on that,
you're going to see exactly the same drop down. And here we have the
option of creating parameter. Let's select that. And now we have the window
of creating parameters. First thing first, we
have to give it a name, We're going to call
it choose threshold. Next we have to define the
data type of the parameter. And if we go over here, you can see a list of all data types. But here you know all of them. But Table decided to go with float and integer
instead of number, hole and number, decimal. But they are exactly
the same for now. We're going to go
with the integers. We don't want to have
decimal numbers in the KPI. And then once you do
that, we can define the display format here
For each data type, there are different formats
to represent the values. So as you can see, we have
automatic number standards, percentage, currency,
customized. I'm going to stay
with the automatic. And then in the next one, you have to define
the default value that's going to be
show up in the input. So here I would say
it's going to be the 10,000 And of course the
users can change that. Then after that, you
have different options to limit what the
users can select. So the default
option here is all. That means you are allowing
the users to enter any value, but of course, we limited
the data type to integers. That means the
users cannot go and enter any characters
in the input field. Or you define for the user
a list of allowed values. So here you can go and
allow, for example, five different values, maybe to make sure that nothing
goes wrong in the view. So here you are making the
parameter more restrictive. So the list is something
like discrete, you are allowing a list
of distinct values. And the next one is
something like the pens, you are defining the start
and the end of the range, and then you are
defining the steps between those two values. So for now I'm going
to leave it open ended so the users can select
whatever they want. All right, so now let's
go and at Ok to create the parameter and now if you check the data bain
on the left side, let me just minimize
those tables. You can see that the
parameter is going to be created always at the
end of the data pane. So there is like a separator between your data
and the parameters, and that's because
the parameters are something that is independent
from your data source. So there is no
dependence between the parameters and your dataset. It's completely
something independent and only special
for the workbook. Okay, so now we
have the parameter, how we're going to
show it to the users. In order to do that
it's really easy. Go to the parameter,
right click on it, and then we have the
option of showing parameters in the view.
Let's select that. And now you can see
the parameter input on the right side of the view. Here we can see the value
of ten K as a default. Now let's go and
change the value. We're going to have it like 500. You can see nothing
change in our view. So it doesn't matter what
you are giving here. You see that the view
is not changing. That means we have now to
connect it somehow to the view. And in order to do that,
we're going to go inside the calculations and replace the constant value
with the parameter. Let's see how we can do
that. We're going to go to our calculation, the QBI colors. Right click on it, and
then let's go to Edit. So now we have to go over
here and replace this value. I'm going to remove
it and now we're going to type the name
of the parameter. As you can see Tableu, suggest
us here and click on it. That any values that the
user is going to give for this parameter going to be used directly in this calculation.
Let's try that out. Can click okay. As you can see something changed
already in the view, but let's go and play
with the values. Instead of five K, we're going to have like 20 K. It's okay. And with that, I just changed
the threshold for this KPI. So now anything below
20 K going to be red, anything higher
going to be green. Let's have another value like 50 K. And now as you can see the
threshold is really high. We have only two values. It's green, and as you can
see, it's very dynamic. And you give the
users the power of defining and customizing
the KPI as they want. And with that, you're
going to cover a lot of requirements
in only one view. I just love this
feature in Tableau. All right, so that's all for
the dynamic calculations. Next we will learn how to use parameters to create
dynamic reference lines.
114. Tableau | Dynamic Reference Lines using Parameters: All right, so now let's see another use case
of the parameters. We can use parameters
in the reference line, so we can show in our
view a reference line to indicate what
is the threshold, just it makes it
more clear where is the cut between
red and green. And here we can use our
already existing parameter, how the threshold in
the reference line. Let me show you quickly
how we can do that. So now let's go to
the analytics pane. And then here we
have the option of creating a reference
line over here. So let's go and
doublicly connect. And now we have a new window to configure the reference line. There are a lot of options, but now we can focus
on the parameters. What is really
here, important is value of the reference line. Now let's check the option
as we can see over here, as you can see Tableau here
suggesting the metric. The second one is to
create a new parameter. The third one is to choose the already existing parameter. As you can see, we can create new parameters exactly in
the place that we need it. But for now, it makes
really sense to use the same parameter in
the reference line. Let's go and select that. Now as you can see
on the right side, we have already a
reference line in our view and we have the
label of choose threshold. Instead of showing the labels, we can show the values
of the parameter. In order to do that, we're
going to go to the labels and we can change this two
value. Let's select that. And that's it for now,
Let's go and click Ok. So as you can see,
we are showing now the threshold as
a reference line. And if we go and change
the value of the 50 K two, let's say ten k, let's go. Now as you can see,
the user can control everything in the view with
their input in the parameter. They are changing
the calculations as will the reference line. It's really cool
and professional to have this dynamic
on your reports, so this is how you
can use the value of the parameter inside
the reference line. All right, so that's all for
the dynamic reference lines. Next we will learn how to
use parameters in filters.
115. Tableau | Dynamic Filters using Parameters: All right, so now
we're going to go to the next use case where we're going to use the
parameters in filters. And we can learn as
well how to create parameters exactly in the
place where we need it. So now we're going
to go and create a report where we're going to show the top ten
products in our dataset. In order to do that,
we're going to stay with the peak data source. And let's go to the products and we take the product
name autoblicly. Now we have a list
of our products and what do we
need is a measure. We're going to go to the orders and we're going to
take the sales, drag and draw it
over here as usual. Let's have labels and
I'm going to sort it. Descending. Now we want to show only the
top ten products. In order to do that,
we're going to take the product name in the filters, so we can drag from here by holding control and then
drop it on the filters. Now in the filters over here, we want to show the
top ten products. In order to do that, we're
going to go to the top top. And now we're going
to go and define the rule. Everything is fine. So here you can see
Top Ten by Sales. Now as you can see, we
are defining a rule. In this rule, it's
like the calculations, we have a constant. The constant in this
rule is the ten. Now you might be in
the same situation where you have one
user asking for top ten products and another user asking
for top 20 products. Now instead of going and
creating two different filters, two different views, we can stay with the same view
and use parameters. And then you're going
to give the end users to define their list. So now we have to change the
value of ten to parameter. So let's click over here. And here we have always
the three options. Either the value you
enter or you can create a parameter or use already
existing parameter. Now we want to create a new
parameter for this view, and as you can see, this is the second method on how
to create parameters. We will not go to
the databain we're going to create it
exactly where we need. Let's go and click
Create a New Parameter. So now we have here
again the same window where we're going to
create a parameter. We're going to call it
Choose Top Products. Now you might notice that you
cannot change the data type because you are creating
here a parameter inside the filter for the sales. And the sales is
measure and the number. But the same here, you can
customize the display format, the current value, and as well which values you can allow, whether everything or a range. So now let's try the range. The minimum going to be one, the maximum going to be 50. And we're going to have
a step size of five. All right, so that's
all. Let's click Okay. So now let's check
again the rule. We have Tube then our
parameter by sales. So that means we don't have a constant value and we
are using the parameter. Let's go and hit okay. So now as you can see,
the report is showing the top ten products because the default value of
the parameter is ten. And if you check the left side, we have a new parameter
called Choose Top Products. Great. Now the next step is
to show the parameters for the users rightly and
say show parameter. All right, so now let's
check our parameter. Now it's showing 11. I
thought I gave it like ten. So let's edit it again. Right click on it and
then let's go and it. All right, because we
blade with those values. So as you can see
it's like pens, it starts from 1611 and so
on because the size is five. So what we're going to do
is to change this to zero, and then as you can
see, we have here again ten. Let's click Okay. All right, so now I promise
you we have top ten, because if you check the value here on the parameter, it's ten. All right, so now this
is something different. Instead of having
input fields here, we have like a range slider. The user can change the slides. You can see our filter reacted and it's showing now the top 20 or the users could use those arrows in order
to change the step. And as you can see, as I'm
moving to different values, the filters eyes as
well is changing. That says this is how you can
use parameters and filters. As you can see, your
view is very dynamic and you let the users to
customize what they want. All right guys, so that's
all for the dynamic filters. Next we will learn very interesting use case
of the parameters, how we can dynamically swap between dimensions
and between measures.
116. Tableau | Swap Measures/Dimensions using Parameters: All right guys, so now
we're going to move to the most important use
case in parameters. You can see this use case
almost in each table project. The use case is to use
parameters to switch between dimensions and to
switch between measures. Now let's learn first how to use parameters to switch between
dimensions in one view. Let's say that you
are building a dashboard about the sales, and you're going to
have views like sales by country, sales by category. That means you are
creating two views with the same metric but
different dimensions. Now instead of having two views, we're going to have only
one view for the users. And they're going to decide which dimension they're
going to use in the view. Now in order to do that, we have to use the power of parameters. All right, so now let's
go and create our view. We have the sales, so let's take the sales on the columns. And then we need the countries. We're going to take it
from the customers. Then we have here the
country and the rows, great. And as usual we're going
to show the labels. So now we want to make
the dimension country as a variable, as parameter. So that means we need somehow to switch between dimensions, between country and
category in the same view. So that means instead of
having the dimension country, we want to have like
a dynamic dimension with different values. Now the first thing
that we have to do is to create a parameter where the user's going to choose which dimension should be
presented at the view. So here we're going
to go and create a parameter from the data pane. Click over here, then
create parameter here. The main focus of
this parameter is to choose which dimension can
be presented at the view. First, let's give it a name, we're going to call
it Choose Dimension. And now the question is what are the values inside
this parameter? It's going to be
the dimension name. So it's going to be values
like country and category. So they are string, so the data type over here
is going to be string. Let's go and select that.
And as you can see, Tableudd disabled the format. We cannot choose a format for the string, it's
like a free text. Next we have to define
the current value, and here we're going
to have the dimension country as a default. So let's go and enter
the value of country. All right, so now since
the datatype is a string, we cannot build a range from it. So here we have
only two options. Either we're going to
have it as a free text, as an input field. And in this scenario,
it really makes sense to have a predefined
list for the users, since the users will not see your data source
and they have no idea which dimensions
do we have for that. If we go with the free text, it's going to be really
confusing and no one's going to get the
right dimension for it. In this scenario, we really must provide a predefined
list for the users, and then they're going to select the value that it's
going to suit them. Here in this example,
we're going to offer only two dimensions. It's the country
and the category. Let's go and add those values. So we're going to
have the country and the next value going
to be the category. And of course, you can add
more dimensions like the city, the product name, and so on. So now we're going to
stick with the example. And that's it, So let's
click okay, great. So now if you check
the data pain, we have a new parameter
called choose Dimension. Here you can see
quickly which data type do we have for each parameter? Now the next step is
to show the parameter for the end users
radically connect. Let's go and show parameter. All right, now let's
check our parameter. On the right side we have
a list. It makes sense. We have created a
list parameter, and at the end we're going to
have a list for the users. And inside it we have
only two values, country and category. Now if you go and switch
between those two values, nothing going to change
in the view because this parameter is not yet
connected to our view. All right, so now we're
going to go and create our dynamic dimension and use it in the view
instead of the country. That means we have to create
a new field in order to do that radically over here and
create calculated field. Let's go there now. Let's
call it dynamic dimension. We're going to use here the case when, Don't worry about it. I'm going to explain everything in the section of calculations. The syntax start with case and then we have to
specify the field name. In this situation, we're
going to enter the parameter, our parameter called here. As you can see as
you are writing Tableau is suggesting
stuff for us. Our field choose dimension. Next we're going
to go and specify an action for each
scenario, for each value. Let's have a new line and right when the first value
going to be the country, you need to be really
careful here to write it exactly as we wrote
it in the parameter. It was capitalized in the parameter and it should
be as well here capitalized, otherwise it will not work. Now, what can happen if
the value is country? Then we have to
specify the action. If the users choose
country, what can happen? The dimension country
should be used. Let's go and write
over here, Country. And as you can see,
as I'm writing is suggesting we need the
dimension country. You can see it from the icon over here, so let's select that. All right, so now let's move
to the next scenario that the user going to go and
select the value of category. It's exactly the same
stuff we can write here. When the value is category,
then what can happen? The dimension category should
be used. Let's start here. Category. And as you can see, we have suggested over here
the dimension category. Let's select it that says this is the
scenarios that could happen to the parameter and we have to end the
case win like this. As you can see in
this calculation, we are just mapping between the values of the parameters
and the dimensions. Let's go and click okay. Now as you can see, we
have a new dimension on the left side called
the dynamic dimension. It is calculated field, and now we're going
to go and remove our static dimension,
the country. And instead of that,
we're going to add our new dynamic dimension. All right, so now let's go and check with the ethical work. As you can see, the value
is now category and in the view we see the categories
which is really good. All right, so now let's change the value of the
parameter to country. As you can see, the dimension
in the view did change. So now we have country
instead of category. As you can see,
parameters are really powerful and you are going
full dynamic in your view, where the users can
define the level of details in the view by
changing the dimension. So imagine now you are making dashboard with sales and
you have ten dimensions. Here you are going
with only one view instead of having ten reports. All right, so that sets
for this use case. This is how we switch between dimensions using parameters. All right, so now you have
the following Tableau task. The task says to create
a dynamic measure using parameters to
between three measures, sales profits and quantity.
In the same view. You can pause the video
right now to do the task, then resume once you are done. All right, so now let me show
you how you can do that. We have exactly the same steps
as the dimensions we have. First to create the parameter
and second to create the logic in the
calculated field. Let's start with the first one. To create the parameters, we're going to go to the data pane. Click over here and
Create parameter, we're going to call
it Chose Measure. And here we have to think about the values of the parameters. So it's going to be the
name of the measures, which means the data type
going to be a string. And here we have to
define the default value. Here we have three values, sales, profit, and quantity. And we're going to have the
default value as sales. Here again, about the values the users don't know
about your data source, they don't know the exact
name of your measures. So you have to go and create a predefined list for
them. Let's go over here. We have three values, so we're going to have the
first one sales, the second one profit, and the third one going
to be the quantity. That's it. Let's
go and hit okay. As you can see on the left side we have our new parameter. And the next step is to show the parameters for
the end users. In order to do that, right click on it and show parameter. Let's check our parameter. Over here you can see it
starts with the sales. Since it's our default, you can switch
between those values, but as you can see, nothing
is changing at the view, the view is still
showing the sales. The next step is now to go and create the calculated field. In order to do that,
we're going to go to the data pane
radically over here, and then select Create
Calculated Field. We're going to call it
dynamic measure Here again, we can use the same syntax case, then the name of the
parameter, so choose. We're going to
select the measure. Now we're going to go and define the scenarios when
the value is sales. Then the action is going to
be selecting the measure, Sales, write sales and
select the measure. All right, new line. And we're going to go now
and map the next value. That's going to be the profit,
then the measure profit. Profit. And let's go
and select the measure. All right, so we map that. We're going to map
now the last value. So we have the quantity. If the user select this
value in the parameter, the quantity measure
is going to be selected as well.
Let's go with that. That's it, this is
our three scenarios we're going to have at the end. Now as you can see, our
calculation is valid. And let's go and hit Okay. If you check the
data Bain, we have new calculated field
called dynamic measure. So now what we can do, we're
going to go and remove our static measure and replace it with the
dynamic measure. All right, now let's go and change the values
in the parameters. Let's start with the sales. As you can see, now we
have the values of sales. If you switch it to profit, you can see the axis and the values in the view are
changing to the new measure. But now let's go
to the last one, to the quantity,
and as you can see, we don't have any data. Well, if you have
something like this, then we have an issue either in the calculations or
in the parameter. Let's find out
where is the error. Let's go to the
calculation again, radically con it and
then go to Edit. And here we have to
compare the values. As you can see, we have here quantity and we have
the dimension quantity. Everything is like correct,
but as you can see, the value over here in the
parameter is quantity. So here I have a typo, and that means for Tableau, we didn't define any
scenario for this value. In order to correct that,
we're going to go to the parameter on the left
side, rtically correct, then go to Edits, and
then we're going to go to our list and
change this value, so double connect and
write it correctly. Quantity. So that's
it. Let's go okay. And now as you can see, we
have data for the quantity, so it's really important to have exactly the same values from the parameters inside
the calculation. So as you can see,
it's really sensitive. So with that we have a
dynamic dimension and a dynamic measure
and we can switch between those staff
as the user wants. All right, so this
is how you can use parameters to swap
between measures. In a view, it is just great. All right guys, so that's
all on how to swap between dimensions and between
measures using parameters. Next we will learn how to use parameters in titles and texts.
117. Tableau | Dynamic Titles using Parameters: All right, so now we can move quickly to the next use case, where we can create dynamic
titles using parameters. Now if you look to
our previous example, we have an issue. You see we have the
title, Sales by country. But the view is showing
category by profits, because we chose over
here, category by profits. And now the title is
wrong and misleading. So how we can solve
this problem, we can use parameters to switch this static title to a dynamic title. Let's
see how we can do that. So let's go to the title. And now we have a new window
to customize the title. Now the rule, as a default, it's going to be the sheet name. That means the name
that you gives to the worksheet going to be
the title of your view. In this example, I call this worksheet as
sales by country. And we have it as
well as a title. But now we have to
change this rule to be measure by dimension. Let me show you how to do that. Let's just remove this rule, and the first word in our naming convention going
to be the measure. Now in order to
insert the parameter, we're going to go over
here on the insert. Then you will have a list of
different table functions. And we have here a section
for all parameters. Here we need the parameter for the measures,
let's click on that. And now the next word in our naming convention
going to be by space. By space. As you can see by don't have any background color
because it is static and the parameter has
a gray color to indicate that this
is a dynamic value. And then the last
word of our title going to be the
parameter dimension. Let's go and insert that. In
the same way, click Insert. And our parameter
going to be over here. Parameter chose dimension. Let's click on that.
The first word going to show the value of
the parameter measure. Then we have, by then we have the value from the
parameter dimension. Let's go and click okay. Now as you can see, the title of our view did really change. So now we have it, correct. Profit by category.
Now as usual, we're going to go
and play with the values of the parameters. Now let's have the
dimension country. And you see now
we have profit by country, the same
for the measure. We can go and select quantity. We have quantity by country. As you can see, it's
really amazing. And you can add
parameters in everything and you're going to have really
awesome views in Tableau. Let's have quickly
another example. We can do the same in the
parameters and filters, and here we can make as
well a dynamic title. Let's double click on the title. Let's remove these parts, we're going to call it top. And then the value going
to be from the parameter, so it's going to be top
30 to 40 and so on. So we're going to go and insert the parameter that you
are using in the filter. So it's going to be the
Choose Top Products. And then we can add
the word Products. So that's it. Let's click Okay. And now as you can see, we have the title Top 30 Products, because the value in
the parameter is 30. And as you are changing the
values in the parameters, you can see the title is as
well changing accordingly. I just love parameters
in Tableau. All right. Okay. So with that we
have learned how to use parameters in
text and titles. And next it's going to be the last use case of the parameters. We will learn how to create
dynamic pills in histograms.
118. Tableau | Dynamic Bins Using Parameters: All right, so now we're going to move to the last use case. We can use parameters in pens. In the last tutorial,
we created pens and histogram about the
scores of the customers, and we have decided that
the size of the pen is ten. Let's go and rebuild
this view quickly. It's really easy. So let's take the scores and put
it in the columns, and then we can
take the count of the customers and
put it on the rows. With us we have an histogram and the size of each of
those pens are ten. Again, we have a constant
value inside our view. Let's go and make it dynamic. So we're going to go
to our pen score. Right click on it and then d it. Here you can see
the pens is ten, this is what we have defined. But now instead of that
we're going to create a parameter raticlick on it. And again we have
here the option of creating a new parameter. Select that, now we're
going to call it choose size of Penso. Again, Tableau did
decide on the data type, It should be based
on the scores, and here we have the
default value is ten. I'm fine with that. Now we have to go and choose
which values can be allowed. Either all the values
or list or range. Here I recommend to use that, a range because if you look
at the parameter range, it really looked like
a small pens as well. It makes sense to define
the range for the users. Here we have the minimum five, the maximum 25, and the
step size can be five. I'm fine with that. I'm
going to leave it as it is. So let's go and click
Ok. And now you can see instead of having
the size of pens ten, we have a parameter,
let's go and hit Ok. So as you can see,
nothing's changed in our histogram because
previously we have the size of ten and the default value in the
parameter is as well ten. Let's go and test everything we have first to show
the parameter. So radically connect
and show parameter. Now in the right
side we have ten. And if we are just moving
between those two values, you can see that our histogram is as well changing accordingly. And with that, the
customers can go and customize the histogram
as they want here. Always, don't forget to make a dynamic title, because
it's really cool. Let's go and do that double
click on it as usual. We're going to remove
this from here and we're going to
call it histogram. So this is the static
part, histochram score. And now we're going to
add the size of pens. So we're going to have inserts size of pens and then
we're going to close it. That's it. With that,
we have a dynamic name. Now you can see the
selected value from the parameter is now
showing in the title. If the user is changing
the size of pens, as you can see the title is
as well changing accordingly. This really makes a lot of
fun working with Tableau. All right, so now
let's summarize. I think parameters are the best feature that
we have in Tableau. Parameters are like variables that allows the users to replace the constant value in the calculations filters
reference line and so on. And another unique thing
about the parameters, that they are independent from your dataset,
from your data source. And the main purposes
of parameters is to make your visualizations
more interactive, more flexible and dynamic. And give different users
the possibility to customize the visualizations
for different ways and requirements without
having to create multiple versions of the
same visualizations. I just love parameters. All right, kay, So that we
have learned everything about the parameters and how to
make our views dynamic. In the next section,
we will learn more techniques about
interactivity in Tableau, and we're going to focus
on Tableau actions.
119. Tableau | Section: Actions: Tableau actions. They are really great feature in
Tableau where it can add more interactivity and
dynamic to your dashboards, which is going to
make your dashboards very modern and interactive. And as well, it can
enable the users to do data accelerations
using your dashboards. So as usual, first we
have to understand the concept behind
the Tableau actions. Then we're going to go
and practice in Tableau. So let's go. All right guys, now we can start with
the first question. What is action? Well, action
is a change of status. That means because of
specific event or trigger, the status of an object
can change from a to B. And the object in Tableau going
to be the visualizations. The starting point,
we call it in Tableau is source sheets. And the action going
to be triggered by the user interactivity. How usually the users interacts with our views using the mouse. Either by hovering the mouse on the data or by selecting
or clicking on the data. And the last option
is using the menu. So far we have
defined for Tableau the starting point source sheet. The second thing we
define for Tableau, what can trigger the action. And the last thing that you
have to define for Tableau is what can happen once
the action is triggered. And here we have six
different options or actions. The first one going
to be go to URL. That means Tableau can jump from Tableau to an external website. So that means the target
is going to be here, a website, not Tableau or
not anivisualizations. The second option is to jump, or to go to another worksheets
or to another dashboard. So here we are moving from
one worksheet to another. Moving on to the third one. We have the filter action. What this means, the actions that you are doing at
the source sheets. It's going to affect the
filtering in the target sheets. Anything that you are clicking
on the source sheets, it's going to impact the
filter in the target sheets. And then we have another
action called the highlights. Here again, we have
a target sheets. And this time, any action that you are doing on
the source sheets, it's going to impact
and going to be highlighted in the target sheet without filtering the data. That means go to Sheet
Filter and Highlights. You have always to specify the source sheet and
the target sheets. And then we have two other
actions where it's going to impact the values
of something. Here we have change set value. So anything that you are
doing on the source sheets, it's going to affect
the members or the values of the target sets. This going to make the set
very dynamic and interactive. The last one we have
change parameter values. Again, here, any interaction that you are doing in
the source sheets, it's going to impact the values of the parameters that we have. Now, all the options
that you can define as a consequence
for the action. So as you can see,
it's really easy. We have to define
the source sheets, we have to define the trigger, and then we can define what can happen once the
action is triggered. All right, so that was
a quick introduction to the Tableau actions. Next we're going to start
with the first type of actions that go to URL.
120. Tableau | Action: Go To URL: All right guys, In Tab we
can create actions either in the worksheet page or
in the dashboard page. In order to do that, we're
going to go to the main menu. Over here we can find
the option worksheets. So let's go there. And then
we have here the option of actions in order to
create new actions. Or we can go to the dashboards. And as well, we have the
same option actions here. But since we are now at the worksheet page,
it is graded out. So now we're going to
learn how to create actions in the worksheet page. And we can start
with the got URL. So let's go back to the
worksheet and the main menu. Then let's go and
click on the actions. With that, we're going
to get the first window. So what we're going to
see at the start is an empty table because we
didn't create any actions yet. But once you start
creating actions, you will get a list of
all actions that you have inside the workbook
or inside the sheets. Now in order to
create a new action, we're going to go over
here, add an action. Then we're going to go to URL. So let's select dot. And here we're going
to get a new window in order to set up our action. In our example, we want
to jump from Tableau to external web
page to Wikipedia. We have to give it first a name. The name of the action
it's going to be go to more details. Then as we learn,
we have to specify for Tableau three things. First, we have to define for
Tableau the source sheets, the starting point
of our action. Then we can specify for Tableau what can
trigger our action. And then at the end, we
have to specify the target. Let's start with the first
one. We have to specify which worksheet is going to
be including this action. Here we have to select
first which data source. It's going to be the
big data source. And we're going to
select immediately the current worksheet,
sales inside source. That's all for the
source sheets. Then we have to
specify for Tableau what can trigger our action. Here we have three options, Either mouseover
select or by menu. Let's leave it as a menu first. Then we have to define
for Tableau what is the URL targets in our example. We have to specify
here, for example, the Wikipedia page. Here
we have two options. Either we can to
create a new tab, or we can create a new
window. That's all. It's really easy, all have to do is to specify the
starting point, what can trigger our action, and what can happen once is triggered, let's go and hit. Okay. And with that,
you can see we have now one action in this table. Let's go and hit okay
again. And let's test it. So far nothing changed
in our visualizations. As you can see, we have the
subcategories by the sales. But now once the user
clicks on the marks, for example, let's go on
the chairs over here. We will see here a new link. It says, go to more
details And this is exactly the actions that
you have defined here, the interaction from the users. They have to go to the marks. They have to click on the
mark and then go to the menu. Once click on the
link over here table, going to jump to a wiki BD
page. This is how it works. Now let's go and try
different triggers. So I'm just going to close this. Let's go back to the worksheets, then go to the actions. Let's go to our action
over here, and go edit it. Now, instead of using now, I would like to have select. Let's see the effect of that. Let's click okay.
And then again. Okay. Now the trigger for the action is going
to be by selecting, by clicking on the marks. Once I click
somewhere over here, Let's go to the storage. I'm going to go and
click on the mark. We're going to go and
jump to Wikipedia. So as you can see here, it's
a little bit more sensitive. Once you click on the Marks, you're going to jump to the URL. Here, we don't have a menu
where we have a link. We're going to jump
immediately to the link. Let's go and try the hover. It's going to be more extreme, so let's go to the actions
again, to our action. And then let's go to the hover. And here you have to be careful as you are, mouse hovering, because you're creating a lot of web pages. Let's go and it. Okay. Now, very carefully, once I mouse over
on the paper table, going to go and jump to WikibD. I didn't click anything,
I just mouseover. So as you can see now, the
action is very sensitive to the user's interactions by just mouse hovering on
the Marks table, execute the action.
With the menu, the users have the chance
to think whether they want to execute the action or
go to the URL or not. With the select, it's
more aggressive where the users can select
on the marks that they can jump immediately
to something else. With the hover, it's very aggressive just by how mouse
hovering on the marks, the action can be triggered. Now let's conclude this and be very careful where
you are mouse hovering because once you
hit any marks table going to go and open
a new web page. So let's go back to our worksheets and then
go to the actions. Let's remove it because it
really doesn't make sense to have a mouse hover
to go to an URLs. The best way is to do that
is to go to the menu. All right, so now since we
are working with the URLs, we can add a lot of stuff
like values, filters, parameters to the URL in order to make
something more dynamic. For example, I would
like the users depends on which
subcategory they select. They're going to go and find more descriptions about this subcategory,
how we can do that. First we're going
to go to the URL over here and we can add wiki. Then we have to add the
value of the subcategory. In order to do that, let's
go to the Insert over here. Then we will get a list of all fields that we have
inside our data source. We are searching
for the subcategory and we can find it over here. Let's go and select
on the subcategory. As you can see, it's like
dynamic inside of our URL. Now I would like to
make the name of the link as well more dynamic. Let's go and call
it Read more about. Then we have to add the subcategory to
make it more dynamic. We have as well here, an insert. And we're going to
go and search for the subcategory we
have over here. That's that we have a
dynamic name for the link, and as well a dynamic link. Let's go and hit okay. And try that again. Okay, let's go, for example, to the tables over here. Click on the mark, and you can see here we have
the following link. It says, read more about tables. So it's read the value from the subcategory that we
are currently selecting. Let's click on that. And here we're going
to jump immediately to the Wikipedia page that
describes the tables. Let's go and try something else. Let's go to the
storage over here. As you can see, the name of
the link is very dynamic. We have read more about storage, and once you look over here, you will get more information about the storage. So
this is really amazing. In order to add more context, more information inside of our alizations and to
make it more interactive, that's all now for
the go to URL action. All right, so that's
all for the first type of actions that go to URL. And next we're going to
learn how to use actions in order to jump from
one sheet to another.
121. Tableau | Action: Go To Sheet: All right guys,
Nick. We're going to learn how to use actions in order to jump from one
worksheet to another one. In this example, we
have the source, or the starting point,
the sales insights. And the target going to
be the profit insights. So now we'd like to
make an action in order to jump from the
sales to profits. In order to do that,
we're going to go to the worksheets in the In. Then we're going to
go to the actions. And we're going to go
and create a new action. This time we're going
to go and two sheets. So let's go and select Dots. And here we got our new window in order to set up the action. It is very similar
to the URL set up. First we have to give it a name, we're going to call it
Go to Profit Insights. And then here we have
the three things. The source, what's going to trigger the action,
and the target. The source is going to
be the sales insights. And the action this time
is going to be as well. By menu, Let's go
and select Dots. And then we have to
specify the target sheet. It's got to be the
Profit Insights. Let's go and select dots. We have our set up. Let's go
and hit. Okay, that's all. Then as you can see, we got
a new action in our table. Let's go and hit okay as well. Now let's go and test it. Let's go to one of those marks. Let's go to the machines. And then we get our menu. We have now two links. The first one says, go to the Profit Insights or read
more about the machines. So this one is going to take us away from Tableau to
an external web page. The first one can move us to another worksheet
inside Tableau. So let's click on Go
to Profit Insights. Now as you can see, Tableau
executed the action once we click on that and we jumped
to another worksheet. Now we are at the
Profit Insights. All right, so that's it. As you can see,
it's really easy. We have to just specify
the source sheets, the target sheets and what
can trigger the action. All right, so that's
all for the type. Got to sheets and next
we're going to learn the action filters and as well how to use a quick actions.
122. Tableau | Action Filter & Quick Actions: All right guys, when we on
to another type of actions, we have the filter action. What can happen here
that anything that you are selecting in
the source sheets, it's going to be relevant
in the target sheets. That means in the target sheet, we will see only the data, only the information that you have selected in
the source sheets. So let's see how this works. We're going to stay
with the same examples, where we have one
worksheets about the sales, it's going
to be our source. And we have another
worksheet about the profits. It's going to be our target.
Let's start with the source. Let's go to the menu worksheets. Let's go to, and let's
go and add a new action. The first one is going
to be the filter. Let's go to the filter here. We get again a new window in order to set up
our filter action. It's going to be very similar
to the previous ones, but here we have a
little bit more options. First we have to give it a name, we're going to call it
Filter Profit Insights Here. As usual, we have to
define the source sheets. It's going to be
the sales insights. I don't want to have all sheets. And then the triggers be, let's say that's going to
be the select this time. Then we have to define
the target sheets. It's going to be our
profit insights over here. The filter Here in
the filter actions, we have more options about
the interactivities. We have to define for
Tableau what can happen once the users
deselect the data, once they clear the selections. So here we have three options. Keep filtered values, show all values, exclude all values. The best way in
order to understand this interactivity is
to have an example. So now we're going to
stay with the default, keep filtered values. Let's go and hit okay. With that, we got our
new action over here. Let's hit okay again. And try the action. The best way in order
to understand how this filter action
works is to bring both of the worksheets
in dashboards. So let's go and create
a new dashboards. And let's go get the source
and get the target as well. Below it, I will just remove
this legend over here. So now let's go and start interacting with the
reports again here. Once we select something
from the source, it's going to affect the data on the targets, for example. Let's go and select, for
example, those subcategories. So as you can see,
my interaction with the source can have an
effect on the target. Now we can see only
the subcategories that I have selected
in the source sheets. With that, the user
is going to get the feeling that everything
is connected together. Everything is interacting
together is alive. Anything I'm selecting
in those worksheets, it has an effect in
the next one here. For this type of
action, we mostly go with the select
instead of the menu. It really makes sense
to select something in the dashboards and to have immediate interactions
in the next one. So as you can see, it's
really easy, right? So now I want you to understand another type of interactivity. What can happen once
I diselect what I have selected or once
I clear my selections, we have selected show
filtered values. So once I, for example here, click on the empty over here to deselect, nothing
going to change. With that, we have kept the
filtered values and this is exactly what we have
specified inside our action. But now if you say,
you know what, once I diselect
stuff in the source, I would like to have
all the values as well deselected
from the targets. In order to do that,
we're going to go back to our action and we're going to go and edit
our filter action. Now if the users go and clear their selections or deselect, we want to show all the
values for the target sheets. So let's switch it like this. Click okay again. Okay. And let's try
this. For example, I'm going to go and
select only the storage. And as you can see, we
got only the storage. And once I clear my
selections, once I di, select anything in the source, you can see we'll
get all the values again in the target sheets. In this scenario, it makes more sense to use these options. If I'm not selecting
anything from a source, nothing should be
filtered in the targets. Now let's go and check
the last option. Let's go to the worksheets
actions, and to the filters. Let's go and exclude all
values. Let's select that. Let's try what can happen now. Now, at the start,
nothing happened. We see all the data
from both sheets. Now let's go and select, for example, those
subcategories. As usual, we will get all data filters in the target sheets. But now, once I dielect, everything going to
disappear the target sheets. So that means the target
sheet will only show the data if I select something
in the source sheets. So that means nothing
here is relevant, as long as I'm not selecting anything in
the source sheets. And once I start selecting something in the source sheets, the data going to be shown. Otherwise, if I do select it
now, don't show anything. One more thing that
I would like to show about the filter actions. If you go to the target
sheets over here, you can see that we
don't have any data. And Tableau can
indicate that there is an action that is filtering the data
inside these worksheets. And you can see in the
name of the filter, we have the word action
Tableau to indicate that this filter is really depending on the actions from the users, any value that is
selected from the users. It's going to
impact this filter. For example, if you go inside
it and edit the filter, you can see nothing is selected. And that's because
in our interactions, we didn't select anything
here in the dashboards. Once for example, I
select those values, you can go back to the
target sheet and you can see those values as well
selected in the worksheets. And if you go inside the filter, you can see those values are as well selected
inside the filter. Anything that starts with
the action and the filter, this comes from
an action filter. And the values inside it can be defined depending on the
interactions that you have done. All right, so that
we have covered everything for the filter
actions in Tableau. All right guys, now
I'd like to show you quick actions in Tableau
using the dashboards. For example, let's say
that we have the sales and the profits and
they are disconnected. There is no actions
between them. But now I can go and
create a filter. Actions between
them very quickly. If you go, for example, to the sales over here, you can find a small icon
for the filters. It says use as a filter. If you click on that, you
can see now it's filled. And now if I'm clicking on
anything inside the sales, as you can see, the
profits can be filtered. Now if you go to the inminute the dashboard, to the actions, you can see that Tableau create automatically
new actions. It's usually has the
name of generated. We have here filter
one generated. This one is created
automatically or quickly as we clicked in this small icon over
here on the dashboards. And of course, you
can go over here and change the options if you
don't want to have Select, you can move it to Menu
to Hover, and so on. And of course, you
can do the same thing for the Profit insights. So let's go and
close everything. Let's go to the Profit
Insights, And we can say, okay, the profit is going to
filter as well the sales. So let's go click on that. And now let's select everything. And anything that
I'm selecting in the profit's going to as
well filter the seals. This is really nice
and quickly in order to create
actions in Tableau. But this is only for
the type filter action. All right, so that's all
for the action filters. Nick, you're going to learn
another type of actions. We have the highlights.
123. Tableau | Action Highlight: All right guys, Now we're
going to talk about another type of actions.
We have the highlight. The highlight is very
similar to the filters where the user is going to interact with the source sheets. In the target sheet,
we're going to focus on a subset of data that we
selected from the source. But the main difference
here is that the unrelevn data will
not be filtered out. All the data going to
be the target sheets, but only what we are
selecting going to be highlighted in
the target sheets. And the best way in
order to understand the highlight action is to have a dashboard with two worksheets. So now let's go and create
a highlight action. As usual, we're going to go
to the main menu over here, but this time we're going
to go to the dashboard. Then let's go to the Actions, and let's add a new action. We're going to go over
here, add an action, and then we're going to pick
this time, the highlight. As usual, we have to
define the source, the trigger, and
the target sheets. Let's go and give it a name. It's going to be
Highlight, Profit Insight. Then the sources,
going to be our sales. I'm just going to remove
the profit from here. And the best way to work or to trigger a highlight
is to have a hover. I'm just going to run
this action on the hover. And then the target going
to be our profit inside. So I'm just going to
remove the sales insides. Then we have some
options to define which field is going
to be included in the interaction as
the default going to be all the fields
or dates and time. Then the last option you
have selected field, so you can specify which field going to be included
in the action. I'm going to stay with
the default all fields. So with that we have everything. Let's go and okay. And with that we got
as well our action. Let's set Okay again. Now
let's go and test the action. Let's go to the source sheets. That trigger going
to be mouse hover. Now as a mouse hovering
on those informations, you can see that Tableau is
reacting in the target sheets and focusing on the data that
I'm like, mouse hovering. If I stay on the storage
sheet with my mouse, you can see that Tableau is focusing on the storage
in the target sheet. And you have a highlighter
with a yellow color. As you can see, it's
really nice, right? It's add more interactivity, more dynamic to your views as
the users are interacting. Worksheets and other worksheet
is getting highlighted. It's really nice. Now you
might say, you know what? I would like to have
the same effect in the profit insights as a
mouse hovering on those data. I would like to have
highlights in the source, in the sales insights, both of those reports or those worksheets can
highlight each other's. In order to do that,
it's really simple. Let's go to the main menu
again, the Dashboards, actions. Let's go to the
Highlight Action. And then let's
include everything in the source sheets and as well everything in
the target sheets. With that, all those worksheets can highlight each other's. Let's go and hit.
Okay. And then again. Okay, and let's check. Now, as you can see as a mouse hovering on the Profit Insights, the highlight is going to be in the sales and the vice versa. As I'm moving on the sales, you can see the highlight is
going to be on the profits. Now the mouse hover
is going to highlight both worksheets. All right guys. Now generally speaking about
the highlights in Tableau, there are different
options where we can add highlights or control
the highlight option. For example, if you go to
the Quick menu over here, you can see that
we have an option to edit the highlights. If you go over here, you can see that we can disable
the highlights. We can enable it, we can define which fields is going to be
included in the highlights. For example, if I go
over here and say, okay, disabled workbook
highlights what can happen that the highlight
action going to be disabled. In order to enable
it, we're going to go again to the
Quick action over here and enable the workbook
highlights as you can see. Now I can highlight on
those stuffs in Tableau. We can add highlights
to the worksheets or to the dashboards if you go
to the main analyzes. And then here we
have highlighters. If you go over here, we
have the subcategory. Since it is the only
dimension that we have in the dashboards
or on those worksheets, let's go and click on that. Now if you take the right side, we cut something like a filter. But it's not really a
filter, it is highlighter. If you click on
this box over here, you will get a list of all distinct values
inside the subcategory. Now what you can do,
you can just mouse over on those informations
and as you can see, the dashboard is going
to be highlighted. This is another way to trigger the action highlights inside your dashboards or worksheets by adding the highlighter
on the right side. For example, if I just
go and click on that, it's going to stay highlighted times since we have selected
this value over here. And of course, if
you want to get everything back to the normal, you can go over here, click on the X and remove the value. With that, we got everything
back without highlights. All right guys, so that's
all about highlights. Actions in Tableau. Alright, so that's all about
the action highlights. And next we're going to
learn how to use actions in order to change the
members offsets.
124. Tableau | Action Sets: Dcast. Moving on to another type of actions,
we have the sets. As we learned before
previously, in the sets, it can split your
data into two groups, the group and the out group. Now the one who is creating the dashboard order worksheets, guarantefine which
members is going to be in and which members
is going to be out. But in order to
make your visuals interactive, we can give
these options to the users so they can define
which members is going to be in and which
members going to be out. In order to do that,
we're going to go and create action sets. So first let's create
a view and the sets. In order to do that,
we're going to stay with the big data source. Let's take the sales
to the columns, the profit to the rows
here in the middle. We're going to go and get
the customer ID that we got, like data points, but we
still don't have any sets. But first let's go and make
those points a little bit bigger in order to
understand the members. And then I'm just going to
go and change the shape as well to be field
circles that sets. Let's go now and create a sets. In order to do that, I'm
just going to go and select those top
right customers. And then we go over here and
then we say create sets. All right, I'm just going
to leave it as it is. And with that we got on the data pain a new
dimensions for the sets. So now we're going
to go and add it to our view as the colors. So let's go and move it
to the colors over here. So as you can see,
the blue going to be the N and the outs
going to be grey outs. I'm just going to
change those coloring. So let's go to the colors
and the going to be, let's say the green and the
outs going to be the Reds. Let's go and hit Apply and okay. And now as you can see, the one who's creating this view is deciding which members are in
and which members are out. But now let's go and give
these options to the users. In order to do that,
we're going to go and create an action set. As usual, we're going to go to the main menu to the worksheets. Let's go to Actions, and
let's add a new action. This time we're going to
use change set values. Let's go inside. And here
we have the usual stuff. We have the source,
what can trigger the action and the target. Let's just give
it a name change, customer ID set and then we're going to go and
define the source sheets. It's going to be the
action sets that we have it and then we have
to define the action. I'm just going to
leave it as select. The target is going
to be the target set. In order to do that, we
have to click over here. And then we will get here all the sets that we have
inside our data source. In this example, we
have only one set, big data source. We
have it over here, customer ID sets, Let's
go and click on that. And now here we have more
options about the sets. The left one going to be what
can happen to the set once the users start interacting
or selecting data points. On the right side here we
have options about what can happen once the users
clear the selection, once the user diselects
stuff in the visualizations. Now we know that Santos options, we have to play
around those values. On the right side, I'm just
going to say keep set values. If I di, select anything in
the view, nothing can happen. Now, in this left group, we have assigned values to set, add values to set, and
remove values to sets. We can start with the first one. Once the action is triggered, we can assign values to sets. What this means, if you choose this one, what
table going to do? Going to empty the group, and anything that
you are selecting, going to be the
members of the group. Let's see what this means.
Let's go and hit, okay. And then again, okay again. Here we have to select in
order to trigger the action. As you can see, we
have those members are inside the group. Now let's say that
I would like to select those four
members over here. Once I start selecting those
members, what can happen? Only those members going
to be in the group can see those
points are now out. That means Tableau is removing everything and
starting from scratch. And anything that you
are selecting going to be the only members
of the group. That's it for this option.
The selection going to define the members of the group. Let's go and change it
to the second option. Let's go to our action, the change customer ID. Now let's move to this one. It says add values to sets
what can happen this time. Tableau will not
forget previously which members were
inside the group. Now we are just adding new members to the sets.
Let's see how this works. Let's go and, and again. Ok, now currently we have those four
members in the group. And let's say that
I would like to add two new members.
So let's say that I would like to add those
two members over here, so let's go and select them. With that, you can see we
still have those members in. We just have added
two new members that set. It's really
simple, right? Let's go and try the last one. Let's go to the action and as well to the
customer change ID. This one we can say remove values from sets.
Now what can happen? It can be exactly like adding
new members to the sets, but this time anything
that you're selecting, it's going to remove those
members from the sets. Let's go and try that out. Let's go and hit
okay. And again. Okay, let's say that I
would like to remove this member from the group
and move it to the out group. In order to do
that, let's go and just select it and click on it. As you can see now
it's thread and it is not anymore in the group. That's it. So this is about what can happen once we
trigger the action. But now let's learn
about what can happen once we start the
selecting the action. Let's go to the actions over here and go back
to our set action. On the right side, we have here three options. Keep set values. Add all values to set. Remove all values to sets. So far we have always worked
with the keep set values. That means if you
clear the selections, nothing going to happen. The members that you
have defined with your selection is going
to stay in the group. But the other two is going
to destroy your definitions. Let's say that add
all values to sets. If you deselect,
it's going to add all values to the group. So this option means
if you disselect everything going to be
in exactly the opposite. We have removed all
values from sets, so if you disselect
everything going to be out, so let's go and select this one. Add all values to sets and
try this out correctly. We have those five members in the group and
the rate is out. And I'm like interacting
with our reports. And I select this point to be
removed from the out group. So now once I disselect
or clear my selection, what can happen, All the members going to be in the group. And the other option can
be exactly the opposite. If I disselect everything going to be read and
going to be out. All right. Okay, so that's
all for the set actions. As you can see, it's really nice feature
where you can give the users the freedom to choose which member
is going to be in, which member is
going to be out in order for them to do focus analysis instead of us the one that is creating
the dashboards. So it's really adds more dynamic and more
interactive to your views. All right, so that's all
about the action sets and next we're going to
learn the last type, how to use actions in order to change the values
of the parameters.
125. Tableau | Action Parameters: All right guys,
Now we're going to move to the last
type of actions. We have the parameters. Again, here we can use
actions in order to change the values
of the parameters. So now let's have an example in order to understand
how this works. Let's build now sales by month. So let's go and get
the sales over here. And let's go and get the
order date to the columns. I'm just going to change
it to the months over here and let's go
and add the labels. Now what I would like to
build in this view as I'm like selecting
data from the view, I would like to get the
total sales of my selection. Whether I choose one point or I choose different
group of points, I would like to get the
total sales of my selection. Now in order to do
that, we're going to go and create another worksheet where we want to show the
total sales of our selection. Let's go and create
another worksheet. So the first thing that
we have to do is to go and create a new parameter. Let's go to the data paint, to the empty space over
here, right click on it. And then create parameter. Let's give it a name. It's
going to be the total sales. Inside this perimeter,
we can have the total sales
of our selection. We can have the data type
flows, the display format. Let's move it to a
currency standard and the current value can be let's
say zero instead of one. That's all. Let's go and hit Ok. Radically connect
show parameter, currently it's zero and
nothing in our view. Now I would like to
have one sentence here that says total sales. And then we can have the
value of the parameter. In order to do that,
we have to go and create a new calculated field. Let's go over here
in this arrow, create a new calculated field. In order to do that,
we're just going to go to our parameter
from the data, Pain, drag, and drop it
to our calculations. Why we are doing this? Because
we cannot use directly parameter in our
aggregations or in our view, we always have to create a new calculated
field and inside it we're going to
have the value from the parameter. That's all. Let's go and hit Okay. Now on the left
side we have a new calculated field,
our new measure. Let's go and put it inside
the text over here. And as a default, we
can have it as a sum. As the user are selecting
different points, we're going to have the
sum of all our selections. This aggregation is correct. But now here in the
view we have only zero, but I would like to
have a sentence, total sales, then the value. In order to do that, let's
go to the text over here, then to the three points. And now we have a new
window where we're going to customize the text. We're going to say total sales. Then we have the value of
our new calculated field. But let's just make
everything bigger. Total sales, let's
move it to 20. And the parameter or
the calculated fields, it's going to be as well 20. And I would like to
make it more bold. That's all. Click Okay. As you can see, now
we have total sales and the value is zero, which comes from the parameter. Now let's go and change this
value to, for example, 100. Now as you can see, we got
the total sales of 100. And now I would like
as well to change the format of the total sales. Let's go to our
calculated field, Rad. Click on it, then
let's go to Formats. And then here on the left
side we have numbers. If you click on these options, we can go to the
Currency standards. Then let's move
to United States. It's going to be somewhere over here, English United States. And with that, we got
the dollar signs. All right guys, Now the
next step is that I would like to bring
everything in one dashboard, so both of the worksheets. Let's go and create
a new dashboards. Let's get the total sales, and then we're going to
get the sales by month. Let me just make it a
little bit bigger and let's remove the title from
the Total Sales. Now as you can see,
the total sales value comes from the parameter. Now so far, everything is disconnected between
those two worksheets. Thing that I'm selecting here, it will not be reflected
inside the parameter. Now here comes the magic. I would like to change the value of the parameters
depending on my or my interactions
from this view. In order to do that as usual, we're going to go
to the main menu over here to the dashboards. Then let's go to the Actions. And then let's add a new
action and choose this option. Change parameter values.
Let's go inside it. So here we have the usual stuff, The source, the trigger,
and the targets. Let's give it a name
change, Total sales. Let's define the
source. It's going to be the sales by month. Let's just remove the
sheet seven from here. The sheet seven is
the total sales. And then the action
going to be the select. So I would like to select
and trigger the action. And then here we have
to find our parameter. We have only one,
so the total sales, let's select that
on the right side, what's going to happen once
we clear our selections? So I would like to say, okay, let's set it to zero if the users are not
selecting anything. All right, so now the last one we have to define for Tableau, which field going to control the values of the
parameters by the sales. By month, we have
different informations as you can see over here. We have the month and we
have the sum of sales. Of course, the sum
of sales going to be controlling the values
of the parameters. So let's go and select
this value over here. And the aggregation
going to be the sum, since we are finding
the total sales. So that sets all for now, let's go and hit Ok. Then
again Ok. Now as you can see, we have the 100 value
comes from the parameters. But if I select, for example, the data points over here, you can see that the total
sales comes from my selection, the 64,000 So now if I go and select all those values from
the view Tableau going to go and summarize all
those sales from my selections and put it
in the parameter value. So with that we have
connection between the parameters and our
actions to the view, which gives a lot of dynamic and interactivities to
your dashboards. All right guys, so that's all
for the parameter actions. It's really nice
feature in Tableau. All right, so that's all
for the action types. And next I'm going
to share with you my tips about the
action triggers.
126. Tableau | Action Triggers: All right guys. Now I would like to give you quick tips about when to use which type
of triggers of actions. For example, if you want to jump from your worksheets
to another worksheets, or to go to an external website, it's better to give the
options to the users to select this
option using menu. First, show the menu. Slit the users, see the link, and then if the users
wants to go there, they're going to select
the link and click on it. It's always better than
to surprise them by select if the users like
select on something, like suddenly they
go somewhere else. It's really not
nice. Go with menu. If you go to URL or go to if you are using filter action, the best way is to use select. It's like more interactive, once a user start selecting
from more worksheets. The other worksheet
going to be filters. I usually go with
Select if I'm using the filter actions and table
used as well as a default. If you are using a quick
action for filter action, I usually go with Select
For the last one, the highlights, I
really recommend you to go with the hover. As the users are most hovering
inside one worksheets, the other worksheet is
as well interacting. It's really nice and
more like modern. Really be careful about
when on how to trigger, which actions don't surprise your users by jumping
somewhere else. If you are using like
go to RL and sheets, be careful, talk with
your users about it, how they would like to see it, and then maybe together
make a decision about the interactivity and actions together with the
users. All right? Okay, so that's all for me
about actions in Tableau. All right, so that's
all for the tips about the action triggers. And with that, we have completed the section, the
Tableau actions. And in the next section,
we're going to cover a very important
topic in Tableau, the Tableau calculations. We can learn there how to
manipulate the data in Tableau, and we're going to learn
many Tableau functions.
127. Tableau | Section: Tableau Calculations: Table calculations.
We will cover now over 60 different functions in Tableau in order to
manipulate your data. You will not only understand how to use all those
Tableau functions, also you will understand
the concept behind them. Using very simple
sketches and examples in order for you to understand how those tableau
functions works. Because some of those calculations are
really complicated, we will start first by covering the basics about
table calculations. And then we can dive into the most used functions
in the four category, row level calculations,
aggregate calculations, LOD expressions, and
the table calculations. Let's start first by
having an introduction to the basics of tableau
calculations. So now let's go.
128. Tableau | Introduction to Calculations: Everyone. So now we're
going to talk about the calculated
fields in Tableau. And we're going to start
with the first question. Why do we need calculated
fields in the first place? As we learned before, as we are building our visualizations, we always go to the data paint, to the data source, and we grab those fields that
we see to the view. So now let's imagine
that you are in scenario where you need
extra information, information that are not
available in our data source. Or you would like to
manipulate and transform those informations to new
information, to new fields. Or let's say that
we are building a very complex
logic in our views. For all those scenarios, we can go and create new calculated fields in Tableau to be placed
in our data source. Calculated fields in Tableau are user defined fields that are created using formulas
or expressions. So there are additional
fields that you can create based on the
original fields in the data source.
All right everyone. So now we're going to move
to the next question, how to create new calculated
fields in Tableau. There are five methods on how to create calculated fields. Four of them are globally. That means once you create
the calculated field, it's going to appear on the
data source, on the data. Pain to be used in any other worksheets or in any workbook that is
connected to the data source. And we have one local
method in order to create one calculated
field only from one view. And we call it
quick calculations. Now let's go and explore
those five methods. The first way to create
a new calculated field, we can go to the data
pin on the left side. Right click on the white
space, right click over here. And the first option is
create calculated field. Once we go over here, we get a new window where we can
write our expression. That's it, this
is the first way. Let's move to the next one.
I'm just going to close this. If you go over here, we have a small arrow
near the search. If you click on it, we will
get exactly the same list. So as you can see,
the first option, create calculated field. The third way in order
to do that is if you go to any of those fields
inside our data source. Let's say that we go
to the addresses, write a click on it, and then here we have the
option of Create. And the first one called
Create Calculated Field. Once you go there,
we're going to get exactly the same window, but this time we're going to get the field name prepared
in the expression, because here we went
specifically to the address and we create from there a new
calculated field. Let's close this and
I'm going to show you the first methods in order
to create calculated field. We're going to go
to the Analyses in the menu over
here, click on that. And here we have the option
of Create Calculated Field. Once we click on that, we're going to get again
the same window. Those are quickly
the four methods on how to create a
new calculated field. You will get always
the same result, only if you go to
the field and you go from there and create
calculated field, you will find the field
name inside the expression. Now let's go and call it
my first calculation. And I'm just going to give anything here inside
the expression. Let's just type one.
Let's go and hit. Okay. So now we can see
on the databain that Tableau did create
for us a new field. It is like a field,
like any other fields that we have on the databain
in our data source. It has as well a data type. It is continuous
measure because I enter there one, so
it's like a number. You can treat it exactly
like any other fields, but here to understand
which fields are calculated and which
fields are original, you can see on the
icon over here, it has the equal sign. That means if you
see the equal sign near the data type
icon in any field, that means this field
is a calculated field. It is not original field that
comes from the data source. Someone went and created this calculated field and it is based on the
original data. With that, you can
quickly identify which fields are
original data that comes from the source systems
and which fields are calculated fields
created from the users. With that, we have created
our first calculated field. And it is a global field. That means if you go to
any other worksheet, let's go, for
example, to new one. We can find again our
calculated field. Now let's move on to the next
method where we're going to create a local calculated field relevant only for one view. In order to do that,
we're going to have fat something on the view. Let's take, for example,
the customer's first name and put it on the rows. Now in order to make quick
calculated field locally, we're going to go inside the
field, inside the dimension. And we can do that
by double clicking. Once you do that, you can
see we are now allowed to write something
inside this field. And we are writing now
the calculated field. Let's say that, okay, we have
now capitalized letters of the first name and I
would like to manipulate it and transform
it to upper case. I would like to see
everything as an upper case. In order to do that,
we have the function in table called upper. Now I'm writing the
function name and it's going to transform the first
name that I have created, calculated field
inside the first name. Once you go outside,
click somewhere outside or click now we can
see on the results of that, this function did change. The first name case that we have done a
quick transformation, quick calculations
inside the view. If you grab the first name
again from the data pain, you can see that
nothing's changed. We didn't change anything
on the data source, we just changed it
quickly For this view. This is how you can create
quickly new calculated field in the view without
affecting the data source. And it's going to be locally
only available in this view. Now let's say that this transformation here
is interesting and I would like to reuse it somewhere else
in other views. Now, in order to make it available in our data
source, what we can do, we can grab this field from the visualizations and just
put it on the data source. Let's release with
this, you can see. Add the new field inside
the customers and we know this is calculated field
by checking the data type, You can see we have the
equal sign Tableau, Offer us here to rename it. I would like to
leave it as it is, and if you go inside it in
order to edit the calculation, radically connect and
edit the calculation. And again, we cut the window where we can configure
the calculation. All right, Kay, so
that I have showed you all the methods on how to create a new calculated
fields in Tableau. All right, the next step
we're going to go and learn the basic options that we have inside the calculated window. Let's go to our calculated
field, my first calculation. And first let's show
the value in the view. Let's drag it to
the text over here, and as you can see, we
have the value number one. Let's go and edit the
calculated field in order to get the window
radically connect. And let's go to the edit. So what do we have over here? First we have the name
of the calculated field, and we called it, in this
example, my first calc. But of course, you can
go to the data pane or the data source and rename
it directly from there, or you can do it inside
the calculated window. Okay, the next information
we have the name of the data source where we are creating the
calculated field. In this example, we created the calculated field inside
the small data source. This is really
important if you have multiple data
sources and you are creating a lot of
calculated fields, it's really nice
to know where I'm creating now this
calculated field, so it's nice and f Now moving on to the most important
section in this window, this white area
where you can write your expression to define
the calculated field. Currently we have one, but we can go and use different stuff. We can use the field names, parameters,
functions, and so on. For example, we
created last time the upper function
for the first name. With that, I have defined what should be done inside
this calculated field. This is my expression. Now don't worry
about the syntaxes that I'm writing inside
the expressions, because in the next
tutorials we're going to learn everything
about the syntaxes, about different
functions in Tableau. Don't worry about it now.
Next information that we have is we have the info of
the calculation is valid. Here, Tableau gives us a
quick information whether the expression that I just wrote
valid or invalid currently, I wrote the calculation
in correct way. That's why we have everything
fine from Tableau. But now let's make
something wrong. Now we will get a red
message from Tableau saying the calculation
contains errors. And here we have small arrow. If you go over here,
you'll see the message. It says Tableau is expecting here a closing
parenthesis here, Tableau, show us a
quick message to know what's wrong
in our calculation. If I go and add the parenthesis, you can see that the
calculation is valid. We have quick info from Tableau. Moving on to the next
information that we have. In this one it says one
dependency and small arrow. Let's click on that and
see what we have here. It says changes to this calculation might
change the following sheets, sheet number one here, Tableau gives us a warning. Anything that you
are changing in the expression inside
this calculation, it might has an effect
on the sheet number one. And that's because we are using this calculated field in the view in the
sheet number one. This is very important
information, especially if you have different
worksheets and you are using the same calculated
field in different worksheets. And this happens a lot, especially if you are like
focusing on the content of one view and you go and change
the calculated field here. It's like a reminder, a warning from Tableau tells
you, all right, if you do this change,
you can affect the following worksheets here. My recommendation for you is always to go and check
the dependencies to make sure that
the changes that you are making currently to
the calculated field, it is still relevant
for the other sheets. All right, so moving on, we have two simple bottoms
that apply and okay, I don't have to talk
about it, I think. Then we have here a small arrow, and this is very important. So let's go and click on
that. What do we have here? And this extension
is documentations or a catalog of all the functions
that we have in Tableau. So for example, let's
go and search for the function upper that
we use in this example, search for upper, and
now we can see on the right side the
documentation of this function. So here we have three
informations from Tableau. The first one is the
syntax of the function. So syntax says it's start
with the upper keyword. It accepts only field and the data type
should be a string. The next information we have a short description
of the function, so it says it's going to convert a text string to all
upper case letters. The third information, we
have an example of use here. It says, okay, if you have an upper for the value product, everything in lower
case, the output, the result going to be a
product in upper case. Here we have a nice
short quick descriptions about all functions that
we have in Tableau. This is very useful, especially
while you are writing the calculations because it doesn't make sense to
memorize everything, right? I tend as well always to
check whether I'm using the correct syntax or even a using the correct
like function. I always check the
examples and say, okay, this is the
one that I need. And one more thing that you
can see in this window, this drop down menu. And here we have
different groups of functions in Tableau, for example, we have here the
group of string functions. If you go inside it,
you will get a list of all functions that's going to manipulate the string fields. So we have here at the
end, as you can see, the upper function that we
use in our calculation. All right, Kay, so
with that we have covered all the options that you can see inside the window
of calculated fields. All right, so that
was an introduction to calculated fields in Tableau, and next we're going to
learn the basic components of Tableau calculations.
129. Tableau | Calculation Components: Guys, so moving on, we're
going to talk about the basic components of
calculations in Tableau. That means what kind
of information we can add inside the expressions,
inside the calculations. The first thing that we can add inside the calculation
is the comment. Comments are really useful for you and for the others to have some context or
small descriptions why you are doing
the calculation. For example, in order to
add comments to this code, we can go on the start and we have the forward two slashes. Then we can write anything. Anything after the
forward two slashes will not be executed
in the calculation. For example, we can
write here calculation to change first
name to upper case. Anything I'm writing
over here will not be executed and as well will
not be checked from Tableau. I really recommend always to add comments for you if you visit
this calculation later, you understand why you
write this expression. All right, moving on to
the second information that we can add inside
the calculations, that are the fields
from the data source. So those are the orange colors. We have it over here,
the first name. But let's just remove everything
as start from scratch. So if you want to
add a new field inside this calculation field, you can start writing the
field name As I'm writing now, Tableau can make a list
of suggestions here, Tableau defined three things. The first one is a function. As you can see, there
is like a small icon, like an F. This indicates
that this is a function. Or the second information, it says the first name, and beside it there
is a data type icon. This data type icon can
indicate this is a field name. The third information
is as well, the first name with the icon. So that means it is filled. But here Tableau writes it, this is from the big
data source because those two fields has the
same name Exactly here. Tableau show for us that this field comes from
different data source. The first one comes from
the same data source. That's why Tableau don't
have to say, okay, it is from small data source, because it is from
the current one. But since the second one comes from different
data source, Tableau indicate that this is a different field from
different data source. Now since we want the first name from the current data source, we can go and select
this one over here. And with that, we have inserted a field inside our calculations, and as you can see, it
gots the orange color. Another way to add fields
inside our calculations, and that is by drag and drop, Let's say that I would like
to get as well the last name. So I can go to the
last name over here, drag and drop it inside
the calculation and as see with that we got
our second field and again it is
the orange color. And of course the fields
that we are add to calculations could be
any fields example. Let's go and add the seals. The seals is a measure
so we go to the orders, the sales, we can just drag
and drop to the calculations. As you can see, Tableau,
except as well measures inside the calculations and
they can have as well the same color,
the orange color. All right, moving on to the next and very important component, we have the Tableau functions. Tableau Functions are built in operators that could be used
in order to manipulate, to transform, to change
the content of one field. For example, what we
can do with the sales. We can go and calculate the
total sales inside our data. In order to do that, we can use the function sum before
the field sales, we can start with the
sum and then we have the open apprentices and
then close as we can see, this component, those
functions in Tableau have always the color of light blue. Now what can happen? Table
going to go and summarize all the values inside the sales and presented
as the result. Let's go and heat. Or
we're going to get an error here because we have
changed the calculation. So let's go and remove it. Let's get it again in
the text so that we got the total sum of
sales inside our data. Now let's go back to
our calculated field and see the next component. We have the logical expressions. We can use the logical
expressions in order to check whether a condition
is true or false. And they have as well,
the color of plaque. So for example, let's say
that we want to create the calculation where we are
checking the sum of sales. If it is higher than 1,000 then we want to see
the high at the end. Let me show you how
we can do that. We're going to use
the statement, it's going to start
with the keyword. As you can see, it is black because it is a
logical expression. If the sum of sales is higher
than 1,000 we can here the operator higher greater than 1,000 then what's
going to happen? We're going to have
the value high. Then we're going to go and
end the logical expression. We can check over here that the calculation is valid here. We have our logical
expressions then and end, don't worry about the syntax. We're going to
learn everything in the next tutorials step by step with very
simple examples. All right, so now
we're going to move to the last component that we
can add to our calculations. We have the Peter parameters, dynamic fields that we can add
to visualizations in order to make everything dynamic in the views or
the calculations. Again, there will be a dedicated
tutorial for that later. But now let's see, We can add a parameter field
inside the calculation. First, we have to create
quickly a parameter. In order to do that,
I'm just going to close our calculation over here. And then we can go to the
arrow and the data pane. Then we can have the
create parameter click on that Here we're
going to get the window. In order to configure
the parameters, we're going to call
it, choose a number. That's it. Let's close
it and say okay. Now on the left side we've
got a new parameter. Right click on it and show
parameter that we got like on the right
side and input field where we can add a value. For example, we have
it now as a one, we can add like 1,000 Now nothing can
happen in the view because we don't have anything. But we're going to go and add this parameter inside
the calculation. Let's go back to
our calculation, my first calculation,
right click on it and then go and Edit. Now what we're going
to do, instead of having 1,000 we're going to get the value from the parameter we make like a dynamic
calculated field, so the user is going to go
and control this value. Let's go and remove the 1,000
And we're going to start writing the name of the
parameter like any other field, so it's going to be choose and we get it over
here, so click on that. And with that, we have added our parameter inside
the calculation. And as you can see, parameters in Tableau has the
color of purple. That's it through
the last component. And with that we have covered all different components that is be used inside calculations. Now let's go and try the output. I'm going to go and hit okay. Then I'm going to remove
this one, it's red. Let's get the
products to the rose. Next we're going to go and
get our new calculated field. This time it's going to
be dimension because the output of the
calculated field going to be a string value. Let's check the results. And as you can see over here, we have two products with the value high, the
rest going to be null. Now let's go and get
the sales in order to understand why those
values are high. And that's because
of our calculation. Anything above 1,000 we
can get the value high. Anything below it
going to be null. And with the
parameter, the users are controlling the calculation. If I go over here and say, okay, instead of 1,000 let's have 500. With that, we have included
as well the other products. So all the products now
has the high value in the calculated
field that we have generated new information
to our visualizations. All right guys, now
let's quickly summarize the components of the
calculations in this example. First, we can see the comment, This comments going to
help us to document the purpose of the calculation and it will not be executed, it's going to be as
well in the gray color. The next component,
we have the field. So any field inside
our data source, whether it's
dimension or measure, we can add it to our
calculation like this one. We have the sales and they
have the orange color. The next component, we
have the functions. They are the build
in operators order to manipulate our data, and they have the blue color. The next component, we
have the operators. In this example, we
have two operators, the plus, the
arithmetic operator. And as with the
comparison operator, it is the higher than they're going to have
the black color. The next component,
it can be as well. With the black color, we
have the letter expressions. Those are static values that we can insert inside
our calculations. It could be a number
like here the ten or it could be string
like here the high. And he don't forget to add that double or single
Oto marks in order for table to understand
this is a value not filled or a parameter or function or anything else and we can add as well date values. All right, moving on
to the next component. We have the logical
expressions we have, if then, and they can help us
in order to evaluate conditions inside Tableau and then to decide whether
it's true or false. And the last component
that we have inside the calculations, we
have the parameters. They are the dynamic fields that we can use inside calculations. All right, so that's all about the components of
calculations. All right? So that we have learned the
main the basic components of the Tableau calculations. And next we're going
to learn how to nest one calculation
into another.
130. Tableau | Nested Calculations: So I'm going to talk about the nested calculations
in Tableau. In Tableau, you can nest calculations by
using the result of one calculation as an input
for another calculation. And that's because sometimes you might be in a situation where we have complicated calculations
with different steps. For each step, we can
have one calculation. As you are implementing
those steps, you're going to end up having multiple calculations
and they're going to be nested
inside each other's. Now let me show you an example. All right, so now
we're going to go and create a new calculated field to manipulate the values of the field country to
have specific format. In this example, let's
take the first name of the customers and as
well the countries. Now we're going to go
and create a new field for the country with
different format. Let's go and create a
new calculated field. And then we're going
to start with the first calculation where we can make all the litters of the field country
with the upper case, so we're going to
have upper function. And then we're going to
manipulate the field country, so we're going to
start writing country. And here it is, our field that sets for the
first calculation. Let's go and hit Ok, that tab. Going to go and create
a new calculated field, new dimension inside
our data source. So let's go and
check the values. As you can see, all the litters, all the countries are
with the upper case. All right, so now
we're going to move to the next step in
the transformation, where we want to show only the first
three characters of each values inside this
new calculated field. In order to do that,
we're going to go back to our calculated field and
we're going to edit it. This time we're going to
use the function left. You can go and search
in the catalog to see the syntax of the left
function as you can see it, except two fields,
the first one is going to be the string that
we want to manipulate, and then we're going
to have the number of characters that
we want to show. Let me show you now, step by
step, how we can do that. Let's go first to a new line. So we're going to have left and then it needs two arguments. The field that we
want to manipulate and the number of characters. The field that we
want to manipulate, it going to be the result
of the upper function. It's going to be
this one over here. So I'm going to just cut it
and insert it over here. With that, we have
the first argument. The second argument
is going to be the number of characters
that we want to show. It's going to be
three characters, that's why we can specify three. This is how we can list
functions in Tableau. The first function to be
executed going to be the one inside the upper function is
going to be executed first. And then the result of this function is
going to be used as an input for the function
outside, for the function lift. That means first we're
going to go and make all the values inside the
country as an upper case. Then we're going to go and
execute the lift function, where we're going to show only the first three characters. Now let's go and hit a
blight to check the results. With that, you can see
we have now only three inside the values
of the country. Again, the function inside
going to be first executed, then the function outside. With that, you can
further expand this calculated field
to more functions. For example, let's
say the third step, we want to go and calculate
the length of the characters. In order to do that, we
can use the link function. We're going to add
it as a starch, and then the input
of the field can be the output of
those two functions. As you can see, it's very easy to nest functions in Tableau. Let's go and had a blind
and check the results. As you can see everywhere
we have the links of three. Again, the order of execution going to be the one just deep inside the upper function,
then the left function. Then the last one to be computed
is the length function. That's it. This is one
method on how to create nested calculations in Tableau, but there is another
method in how to do that. That's by creating a
second calculated field using the first
calculated field. Let me show you what I mean. We can go and close
this one over here. And let's create a
new calculated field. We're going to call it
second calculated field. What we're going to
do inside it is to use the output of the
first calculated field. In this example, it
is the country U. This is our first
calculated field. And then we're going to
multiply it with two. For example here again the order of the computation
going to be first. Tableau has to calculate
the first calculated field, calculate the upper
left and link, and then at the end
it's going to come over here and
multiply it with two. Let's go and hit okay. And with that we've got
a new calculated field. Let's track and drop
it on the view. As you can see there is going
to have the value of six, window I use the first isolde, and window I use
the second mode. All right? So I'm going to show you how
I usually decide on this. Let's go to our
first calculation. And as you can see, those
intermediate steps, if they are not important
steps like you don't want to use them in any
other visualizations, then it doesn't
make any sense to create for each intermediate
steps in your field, inside your data source, then the data
source can explode. And you're going to have
a lot of fields that are not necessary
in this situation. I'm going to have all
those intermediate steps in one calculations. Another scenario where you have a very complex calculation, where the code going to be
very huge and it's really hard to maintain everything
in one calculation there. I try to split it into steps and each step going to have like one field in the data source. The last scenario where
those intermediate steps are really important
for something else, for different
visualizations, or maybe as well for any other
different calculations. In order to not
repeat myself and doing the same calculations
over and over, I go and create a
dedicated calculated field for each intermediate steps
only if they are important. All right guys, that's all
for the nested calculations, that was an introduction to
calculations in Tableau. They are really important to
make grade visualizations. In the next video, we're
going to learn more and more about
calculations in Tableau. All right, so with that,
we have learned how to do nested calculations
in Tableau. And next I'm going to give
you an introduction to the four types of
Tableau calculations. We have the row level, aggregate, table, and
LOD calculations.
131. Tableau | 4 Types of Calculations: Tableau, we have many
different functions that we can use inside
the calculations, and in Tableau we can categorize them into four different
types of calculations. In this tutorial we're
going to talk about them. But first we can have a very simple example to understand how they work and how they interact with each
other. So let's go. All right, now let's
say that you have the following
product table inside our data source where we
have information like the product prices,
quantities, and so on. Those data are the original data that we can find inside
the data source. Now let's say that we
need a new field inside our data source to show
the data of their revenue. In order to do that,
we can simply create a new calculated field
where it's going to multiply the prices
with the quantities. Now with that
Tableau going to go and create a new field inside our data source to store the result of the
calculations inside it. Table going to go row by row by multiplying the prices
with the quantity. So for example,
for the first row it's going to
multiply 20 with two. And Tableu going to go and
store it at the new field. Then Table can jump to the next row and do
the same exact thing. So as you can see,
Tableau is processing each rows individually and independently from each others. When the calculations is
happening on one row, we don't care about
the information that is present in
the other rows. Tableu can focus only
on one row at a time. This type of calculations, we call it row
level calculations. And the level of details
we have it here is the lowest we have the level of
detail from the data source. It's very important to
understand that this type of calculations is the only type that will not go and aggregate the rows of the data
source as well. The only type that can store the results at the data source. That means table will not go
and calculate the result of these calculations each time you are using it in
the visualizations. So it can recalculated and
store it in the data source. The calculation will
not be done on the fly. All right, now let's move
to the visualizations. And let's say that I
would like to show the total revenue
of each product. For that, we can use
the function sum to summarize the
values of the revenue. And we can go and add the
Dimension product to the view. And Tableau here going to show only three rows in the view. A row for each product value. That means we're going to
have P1p2 and P three. Now this time Tableau will
start summarizing and aggregating the rows
in the data source. That's going to be at the
level of the dimension. For example, Tableau going to start with the first product, the one and Table going to summarize the first two
rows from the data source. We have 40 plus 60 Tablo
going right at the output, 100 directly in
the visualization. Then you're going to
move to the next row. We have the P two here. We have only one row
at the datasource. And the summarize of
that's going to be 20 for the product. Three, the three we have here three rows in
the data source. The summarization of
40 plus 25 plus 15. Table going to have the answer
80 at the visualizations. This time as you can see, table is not processing the rows of the data source one
by one and individually. Instead, Tablo going
to go and summarize. Group up the rows of the data source at the
visualization level. This type of
calculations, we call it aggregate calculations and it's going to be calculated
on the fly. That means the result
of these functions of those calculations will not be extra stored inside
the data source. And now it's very important
to understand the level of details of this new
table that we have. In the view, it has lower level of details
as the data source and the one who controls
the level of details is the dimension that
we have on the view. The dimension that we use in
the view going to control the level of details for
the aggregate calculations. And that's why we have
another type of calculations. Because of that, let's say that we have another
scenario where you say, you know what, I would like to control the level of details. I want my calculations to show the total revenue
of each category. Here we can use
different functions like the fixed function, so we can have
fixed category and then some their revenue that
we are telling Tableau. Okay, find the total revenue. But this time it's
going to be fixed. It's going to be connected
to the dimension category. So let me show you
what can happen. Tableau going to go and check. Okay, what is the
category of pay one? It is the category A. Now the next question. What is the total revenue
of the category A? Here, Tableau can
summarize 40 plus 60 plus 20 and the result
going to be 120. Here, Tableau will not show the total revenue of
the product, pay one, but instead of that,
we are showing the total revenue
of the category A. The same thing can happen
for the next product. We have pay two. It belongs
to the same category, two A. The total revenue of
category A is again 120. And then the last
product, pay three. It belongs to different
category, this time to category. And the total revenue
of that going to be 40 plus 25 plus 15. The output can be 80 as a total
revenue for the category. Now, who is controlling
the aggregations? It's not anymore the dimension
that we have on the view, but instead it's going
to be the dimension that we specify on the
calculations, this type. Cations, We call it
LOD expressions. Level of details
expressions here. The same thing, like
the aggregations. It's going to happen on the fly. Nothing going to be stored
inside the data source. All right, now moving on to the last calculation type
that we have in Tableau. Let's say that after I got
the result in the view, I would like to
calculate the rank of the products based on the data that is
displayed in the view. In order to do that, we can use the function rank of the
summary of the revenue. What can happen this
time table will not go and query
the data source. Instead of that, Tableau can go and query the
visualization itself. It's like we are
aggregating the aggregation based on the value that
is displayed on the view, we can find that the product one pay one has the rank one, the two has the rank three, P three has the rank two. This type of calculations, we call it stable calculations. And unlike all other types, it is based on the context
and on the data that is displayed on the
view and it will not go directly and
query the data source. It is as well
computed on the fly. That means the result will not be stored inside
the data source. If you're talking about
the level of details, it depends as well on
the visualization. It can depend on the
dimension products. All right guys, so that we
have now a big picture about the four different types of
calculations inside Tableau. And we can see how Tableau
can compute the calculations, present the data at the
end in the results. All right, so now we're
going to start with the first type of calculations. We have the row
level calculations. And here we have a
lot of functions under this category if you
compare to the other types. So here we have the
number functions, string, date, logical functions. There are a lot of functions, but we're going to cover them
all in the next tutorials. So now let's go in Tableau and try a few of those calculations. Okay, so now back to Tableau, we're going to go to
the small data source, and then we're going
to go to the orders. As you can see, we
have here the quantity and as well the unit price. Now we're going to go and
calculate the revenue, where we're going to multiply the quantity with
the unit price. Do that, we're going to create
a new calculated fields in the data source
and this going to be row level calculations type. Let's go and create a
new calculated fields. We're going to go
to the data pane radically in the empty space. Create calculated fields and let's give it the name revenue. And then the formula
for this going to be quantity multiplied
with the unit price. Now you might ask me,
where do I find in Tableau all the functions
that are related to the type row
level calculations? Well, there's no
specific place for that, but there's like
orientations for it. So if you go to
the documentation over here and check
those groups, you will not find directly the
types of the calculations, but you will find some groups that are similar to those types. For example, if you
can see over here we have table calculations. If you go inside
it, you can find all the functions that we
could use in this type. And then we have another
group called aggregate. And here you will not find only the aggregate calculations, but as well you will find
the LOD expressions. The last one, the last type is the row level calculations
is actually the rest. All other like the number
string data type conversions, all of those stuff are
row level calculations. All right, so now back
to our calculations. Let's go over here and hit okay. And with that you can see
that Tablo did immediately create a new field
in our data pane. Now as I told you, if you are using row level calculations, Tablo can do the
pre calculations and store the results
immediately in the Da. Let's go and check that.
Either you can go to the data source page or we can go to this small
icon over here, it says View Data. Let's go inside and
check the results. Here we have to
switch to the orders. Now let's scroll to the right. You can see we have
the original field, We have the quantity and
as well the unit price. But we have as well our
new calculated field, which is like any other field that we have in the data source. We have the revenue over here. And as you can see, Tablo
did immediately stole all the results of this calculated field in
the data source, even though we haven't created anything yet in the
visualizations, that means Tablo prepare for you in the datasource and
we can check the result. For example, here we
have the quantity one, the unit price 215. We're going to get
the same course. And here the things are
multiplied with two. So as you can see, we are now multiplying the quantity
with the unit price. And now we can see
very clearly that the row level calculations
will be calculated and performed on the row level individually and independently
from each other's. So the information that
we have in the other rows will not affect the
calculations of the first row. All right guys, so that's it. This is how the row level
calculations works in Tableau. Okay, so now we're
going to move to the next type of calculations. We have the aggregate
calculations. And here we have
few calculations. If you compare to the
row level calculations, we have max in average, count, count distinct and
attribute again. All of those can be covered
in details and extraorials, but now we're going to go in Tableau and try a few of them. All right everyone, So now we're going to go
and build a view where we have the total
revenue by products. In order to do that,
we're going to go and get the product name from the small data source
and let's put it in the view. Now it's really important
to understand the concepts. Now the Broaduct name is
the dimension that can define the level of details
in the visualizations. That means in this view we have five rows and this is completely controlled
by the broduct name. Now I want you to
understand how to pick which type of calculations
we're going to use. Now to answer this question, we start always with
the first question. Do we have to aggregate the
data since the task saying Revenue, That means there's like an aggregation And
summarizations. Well, that means we cannot use the row level calculations, then we have to use
the other types. For aggregations, then we are
left with the three types. Now the next question
going to be, do we have all the
data in the view? Well, as you can
see in our table, we have only the
dimensional information. We don't have anything
about the revenue. That means no, we don't have all the data
inside the view. That means we will not use table calculations type because the table calculations types
always depend on the view. If you don't have the
data in the view, you cannot use
table calculations. That we are left
with two options. Either we can use the aggregate
calculations or the LOD. Well, the last
question you can ask, does the level of
details that we have in the view can fulfill
my requirement? Well, in this example, yes, because we want to have the
total revenue by products. So we are talking
about the products and the dimension that we have inside the view exactly fulfill the level of details. That means we can stay with the level of calculations
that we have inside the view
and we don't need to use any LOD expressions. If you follow those
three simple questions, you can easily
identify which type of calculations you need
to solve your task. In this example, it can be
the aggregate calculations. Let's see how we can do that. Since the aggregate calculations are the default
methods in Tableau, In order to aggregate
any data or any measure, it's going to be
really easy to create. So all what we need
is their revenue, so just drag and drop it here
on top of those numbers. And with that Tableau going
to create immediately an aggregate calculations,
we can see it over here. The sum of their revenue. That's because it is
the default method on aggregating data table
goes for each product inside the data and
start aggregating all the revenues that are
related to these products. Now the next step,
what I usually do, I go and validate some examples. I go and pick some
of those products and start summarizing
the values to check whether the value that I'm seeing in the
visualizations is correct, let's go and create
a new sheets. And here we want to go
to the lowest level. In order to do that,
we're going to take the order ID, the view. Let's take now the product name. We can take the
categories as well. Then let's take their revenue and put it on the APC over here. Let's make it a little bit
bigger in order to see the names and then we can go
and sort the product names. So now we can any
of those products. In order to validate
the answers, let's take the LG
Fol HD monitor. As you can see, the total
sum should be more than 3,000 Let's go back to our aggregations and
check the LG Fol Hd. You can see it is above 3,000 That means everything is fine. And with that, we got the
total revenue by products. And of course, we have
done this in the quick way where we drag and drop
the field to the view. But if you want to do it as calculated field in order to re, use it later in
different sheets, we can go and create a
new calculated fields. Let's call it Total Revenue. And then we're going to
have the same syntax, the sum of revenue. This time we're going to use
the nested calculations. So we have it already in
another calculated field. Let's go and click
on that. And as calculation is valid,
let's hit okay. And we got with that a new
measure in our data pain. So if you go and replace it, you will get exact results. So as you can see in the
result, nothing changed. The only advantage
to you this is to reuse it in different sheets and as well in
different workbooks. All right guys, That's all for the aggregate
calculations in Tableau. All right guys, the third type of calculations in Tableau, we have the LOD calculations
or the level of details expressions and here we have only three
Tableau functions. We have the fixed,
include and exclude. Now let's go in Tableau and create one of those functions. All right, now we
have the following task where we want to show the total revenue but
using the same view. So we're going to stay
with the same information. We're going to have
the product name, we're going to have the total
revenue by the products. But I want to see side by side the total
revenue by category. Let's go again through
the three questions. The first question is, are
we doing aggregations? Well, yes, that means we cannot use role
level calculations. Then the next question is, are the data that we
have in the view enough? Well, it's not here. It's not the total
revenue by category, it's by the products. Well, that means we cannot
use the table calculations. Now we come to the
last question. Does the level of details
in the view going to support me to solve the task?
Well, the answer is no. That's because the level of details inside the
view now defined by the product name and it has a higher level of details than the category
we want to have, the total revenue by category. The level of details
that we have in the view will not support me. That's why I cannot use here
aggregate calculations. And I have to go and
use LOD expressions. As you can see, very
simple questions. And it's going to
move you exactly to the right type of
calculations in Tableau. And now you might say
weight weight rates. I can go and add the
category information to the view and then
I have the level of details of the category. Well, this will not
work and that's because the broad act name has a
higher level of details. Let me show you what can happen if you bring the category. So let's go and grab the category to the
right side of our. Here you can see nothing going
to change. We still are. The five rows, and that's
because of the product name. Even if you move it to
the left side over here, we don't have here two rows. We have here five rows. If you can check the
details over here, we have five marks. So that's why even if you are adding the category,
nothing going to change. We are still with the
product level of details. Now let's go and create
a new calculated field to use the LOD expressions
or calculations. Let's go to the left side and create a new calculated field. We can call it total revenue
by category and the syntax, don't worry about it,
we're going to learn it in a separate
tutorial about it. So it's going to have the
following syntax fixed. Then we have to specify
the dimension that's going to control the level of
details of the results. It's going to be the category. And then what we are doing,
aggregating the revenue, we have to add here,
sum of revenue. And then we have to
close it that says the calculation is valid
and everything is fine. Let's go and hit okay. As usual, we're going to get in new calculated field in
our data in over here. Let's get the result.
And let's drag it over here to see the data. We can see for each row the total revenue
by the category. For the first one,
it's going to be the total revenue
by the accessories. The second one the same because it's belonged
to the same category. The third one the same, but the fourth one you
can see it belongs to different category and
that's why we're going to get different
numbers. That's it. This is why we need LOD
calculations in Tableau. Now we're going to move to
the last type of calculations that we have, the
table calculations. And here we have as
well, few calculations. So we have the running
window rank first, last index lookup, and so on. Again, here we can
have dedicated tutorial for those stuff, but now let's go and
try one of them. All right everyone,
so now we're going to move to the last
task for this view, we want to show
the running total of the revenue by the products. Here we're going to ask
again the three questions. Are we aggregating? Well, yes, because we are having the
running total of the revenue, we cannot use the row
level calculations. The next question is, are
the data that we have in the visualizations are
enough to solve this task? Well, yes, that's
because we have the total revenue by the
products and the view. Based on those informations, we can build up
the running total of the revenue by the product. So we have actually
everything in the view in order
to solve the tasks. And that's why we're
going to go and use the type table calculations. And we will not bother
with the third question, whether it's aggregated
calculations or LOD, because it is table
calculations. So let's go and create
a new calculated field. We're going to call it
Running Total Revenue. The syntax for that is
as well very simple. We'll start with the running, then we have to select
which aggregation type it's going to be the sum. And then we have to go and specify which data are going to be calculated inside
the table calculations. And here we have only
two informations, so either we're going
to use a total revenue or the total revenue
by category, the LOD, but we are talking about the total
revenue by products, that's why we can
include it over here. That's going to be the
sum of the revenue, and that's it, and the
calculation is valid. So let's go and hit okay. And we're going to
take our measure and put it as well on the view to check the results that
we can see very nicely. They're running total of the
revenue. It's very simple. Let's start with the first
value from the total revenue. Then the next value
can be based on the previous value plus
the total revenue. Those two values are
going to be added to each other in order
to get this value. Then the next one the same, the previous value, plus
the current total revenue. As you can see, we
have nothing here. That's why we are
getting the same value. As you can see, as
we are moving down, we are adding more total
revenues to the total number. Now, it's very
important to understand that the table calculations are very sensitive to the data that is
displayed in the view. Any change to this structure, we're going to get different
numbers at the output. This is not the case for
the aggregate or the L. Let me show you what I mean. For example, let's
go and just change the sort of the data
inside the product name. Let's go over here and
make it descending. For example, you can see that the aggregate calculations or the LOD, the values
are the same. It'll just change the sort. But the values inside the table calculations did
change completely because we have now different
sort and Tableau going to recalculate the running
total based on the view. That means any interactions
in the visualizations, it can affect the table
calculations functions, It is completely based on
the view. That's it for now. This is about the table
calculations in Tableau. All right guys, now we're
going to talk about computations of those different calculations types that
we have in Tableau. Now let's say that we have
the following calculations, and it's very similar to the
listed calculations here. We have different types. We have the rank for
the table calculations, we have the sum as an
aggregate calculations, and we have the quantity
multiplier with the price. As row level calculations, the first thing
to be executed is always the row
level calculations. The first one going
to be quantity multiplier with the price. Then the second type
to be executed in Tableau going to be the
aggregate calculations. It's going to be the sum
function in Tableau. And the last type of
calculations that's going to be executed in Tableau going
to be the rank function, the table calculations, again, row level calculations
as a first, then the aggregate calculations, and always the last one,
the table calculations. Okay, now let's go and quickly recap how to choose the
right calculation type. Here we have three questions. We started the first one. Do you have the aggregated data? If no, then go and use the
row level calculations. We are at the row level. If yes, then we jump
to the next question. Is all the needed data already available in
the visualizations? If yes, then we can use
the table calculations. If no, then we have here. The third question is
the level of details in the visualizations matches the question or
the requirements. If yes, then we can use the
aggregate calculations. If no, we can go and use the LOD expressions or calculations if you
follow my decision three, you can simply find
an answer for that. All right, Is that you
have now an overview of the different types
of calculations that we have in Tableau. Next, we're going to do a
deep dive in each type of them and we will start with
the role level calculations. Here we're going
to cover a lot of functions in Tableau that
are very important to do, data manipulations and
transformations and generate as well in new information that you need for your
visualizations.
132. Tableau | Number Functions: CEILING, FLOOR, ROUND: So now we're going to start
with the first type of calculations there, row
level calculations. And in this tutorial
we're going to cover the number
functions in Tableau. So the main purpose of the
number functions in Tableau is to manipulate and transform
numerical values. So we can use them on any field with the
data type number. And the most important
use case for the number functions is
to simplify the numbers. Here we have three functions. We have the ceiling
floor and round in order to round the numbers to
similar form as usual. First, let's understand
the concept behind them, then we can practice in Tableau. Let's go. All right, so now let's say that we
have the following scenario. We have built a view from the subcategories and
the sum of sales. Now if you take a look
to those numbers, you can see that they are large numbers with a lot of
fractions, a lot of details. We have three
decimals over here. Those details are going
to make it really hard to read those
numbers in the view. Instead of that, we can round those numbers to
make it easier to read and hide those
small details that are unnecessary here. If you take the sales,
the rounded sales, you can see now we have
smaller size in the numbers. We rounded all those fractions, all those decimal numbers. With that you can see if you compare the right to the left, it's easier to read right. Now let's learn how this works. Each decimal number,
like for example, 1.4 it has always two
integer neighbors. Think about it like
we have a room, it has a ceiling and floor. In this example, the 1.4 has the ceiling of two
and the floor of one. Here, we might be in
a situation where I don't want to deal with those details, with
those fractions. I would like to have a whole
number two or one here. Exactly. We have two options. Either we're going to move it to the ceiling to the
higher number, or we're going to
move it to the floor, to the lower number. If you decide to use the
ceiling function number, going to be two. What we are doing here is we are rounding up the number to the higher value to the ceiling or we are moving
it to the floor. That means we are
rounding down the number, the floor function going to
round down the 1.4 to one. Now you might say,
you know what, I don't want to decide
whether it's going to go to the ceiling
or to the floor. I would like to
have it automatic. It should go to the
nearest integer, and here we can use
the round function. Let's have the
following example. Let's say we are at
1.3 If you use round, we're going to go to
the nearest neighbor. The nearest neighbor
going to be one. The round going to
move the value to one. But now let's take
another value, 1.7 Here, the nearest
neighbor is not the floor. It is the ceiling.
It's more near to two. If you use the round function, it's going to convert it to two. Now let's say that our
value is exactly in the middle of 1.5
What can happen to the value if I use
round because it has exactly the same distance to the ceiling and
to the floor here. What can happen is
it's going to be rounded up to the ceiling. We have to have only one value, 1.5 the round of that's
going to be two. As you can see, this is how
those three functions works. All we think about
it's like a room. You have a ceiling and floor. All right, now let's compare
the three functions side. We're going to start
with the ceiling. The ceiling going
round up the numbers. The syntax in tablo
going to look like this. Ceiling and it accepts only one argument,
the original number. For example, the
ceiling of 1.2 is going to be two ceiling
of 1.8 going to be two. Ceiling of 1.5, can be two, we are always going to the higher number. Let's
move to the next one. It's going to be
exactly the opposite, the floor going to round down
the numbers to lower value. The syntax here is floor it, except as well only one number. The examples are floor
1.2 can be 11.8, can be 1.1 0.5 can
be as well one. We are always going
to the lower number. Now let's go to the last one. We have the round round the numbers to the
nearest integer. The syntax for that is going to be a little bit different. We have round, then
the original number, then we have a decimal here,
it's optional, of course. Here we can decide as well
whether we're going to see, for example, one
decimal, two decimals. And if you leave it empty, it's going to round
it to a whole number. Now let's go to the examples
for the same numbers. If you round 1.2 it's
going to go to the floor. The nearest to be, if we round 1.8 the nearest going
to be the ceiling, it's going to go to the two. If we round 1.5
exactly the middle, it's going to be rounded
up to the ceiling, so we have a two. That's it. This is how the three
functions work. Now let's go back to
Tableau and start. All right guys, back to Tableau. Let's create now view that. We're going to show the
orders with the sales. We're going to stay with
the small data source. Let's take the order ID, put it on the rows, and let's
grab the sales to the view. As you can see, the sales
don't have any fractions. And that's because, not that
the numbers are rounded, it's just the format
is different. In order to show
the real values, we have to change the format. In order to do that,
we're going to go to the major
sales of our here, right click on it and
go to the format. Then we're going to
go to the left side. We have here numbers. Let's
click on this menu and go to Once you do that, you
can see that we have the raw data as we have
it in the data source. Now we want to
round those numbers to make it similar
to read in the view. In order to do that, we
have the three functions and we can start
with the ceiling. Let's close this over here and create a new calculated field. Right click over here
in the white space. Create calculated field. We're going to call
it Sales Ceiling. The syntax is really easy, so it starts with
the ceiling, Ord, and then inside it we have to
have our field, The number, Our field is the sales, and as you can see, the
calculations is valid. Let's it, okay. As you can see, we have now the field, the new calculated field
in the data source. Let's bring it to
the view. Let's go and drag it over here. As you can see, now we
have our new field. Let me just make it a little bit bigger and all those
values are rounded. Let's take the first value. We have 215, 88. As we are rounding up, we're going to go to the next
higher value which is 216. Everything is fine. Let's
check this over here. So we have 56, 11. As we are rounding up, we're going to go to the
next integer which is 57. Everything is fine and
the ceiling functions is now working. All right. Next we got to go and do
exactly the opposite. We're going to round down
the numbers to the floor. We're going to go and create
a new calculated field and we're going to
call it Sales Floor. The as well really easy. The keyword is Floor. And our value going
to be the sales. So that's the
calculations is valued. Let's click Okay.
And our new field is already in our data source. Let's grab it to the view. The first value was 215, 88. As we are rounding down
to the integer below it, it's going to be 215. This value over here,
we have a 56, comma 11. As we are going to the floor, it's going to be 56,
so everything is fine. And as you can see, it's exactly the
opposite of the ceiling. All right, so next we're
going to go around the numbers automatically
to the nearest neighbor. Using the round
we're going to go and create the third
calculated field, we're going to call
it sales round. The functions is really easy. It starts with the round and
it's accept two arguments. The first one is a must, it's going to be
our number sales, and the second one going to be optional in case we want to decide on the number of decimals here we
don't want to use it, we're going to leave
it as default. We don't need any
decimals or fractions, so we're going to
leave it as like this, sales and that's it. So as you can see,
the calculation is valid and we're going to go and now our third
calculated field as well. In the data being, let's just grab it to the view
and check the values. Now, the first value, 215, 88. It is near to the ceiling, that's why the round
going to take it to 216. The next one we had 56, 11. It's really near the floor. That's why Tableau or the round function
going to take it 256. As you can see,
everything is fine and the numbers are moving
to the nearest neighbor. All right, now let's say that we want to see the Els in our view, but having only one decimal, not two decimals like
here in our example. In order to do
that, we can round those numbers to
only one decimal using the round function. Let's go and create a
new calculated field. Let's call it sales rounds one. And we're going to use as
well, the same keyword rounds. The number is going to be sales. And then we're going to define how many decimals do we want? In this example, we
want only one decimal, so we're going to type here one. That's it. As you can see,
the calculation is valid. Let's click Ok. And here
we have our new field, Let's bring it to the view. And now you might say, you
know what, nothing changed. We still have everything rounded to a whole number,
there's no decimals. Well, that's about the format. Let's go and change that.
We're going to go over here, right click on it and then
let's go to format here. We're going to bring
it to the standard. Once we do that,
as you can see now we have only one decimal value. We don't have two decimal
values like the seals, like the original field
in our data source. But now you might say, okay, maybe the round as
well has decimals. So let's check the formats. We're going to go to
the round over here, and let's click Formats. And now if we bring
the standard, as you can see,
nothing is changing. So that means we don't
have really no decimals, we have only a whole number. All right, So now
you might ask me, when do I use ceiling
and when do I use floor? Well, there is no rule for that. It really depends on the use
case and on the requirement. For example, if I'm building a dashboard for budgeting
to bland a budget, I would go always with the ceiling to make
sure that I'm not forgetting anything and I'm not short in the budget at the end. In this use case, I
tend always to use ceiling and never
use floor or round. It really depends on the
requirement in the use case. So as you can see, those
three functions really makes the visualizations easier to read and more simpler.
All right everyone. So, so far we have learned
how to simplify the numbers in Tableau using the
three number functions, ceiling, floor, and round. And that's it for
the first group, the number of functions. Next we can learn the string
functions in Tableau.
133. Tableau | Change Cases: LOWER & UPPPER: Now we're going to focus
on the second group of functions in Tableau. Under the category row
level calculations, we have the string functions. The main purpose of the
string functions in Tableau is to manipulate and
transform the text values, any field in our dataset
with the data type string. There are many use cases and reasons to use string
functions in Tableau. For example, we can
use it to clean up our data and bring our
text to standard cases. For example, we can change
the case to either lower or. And the next use case as well is about to clean up our data in Tableau by removing
any unwanted spaces. Here we have three functions, The left trim, right
trim, and trim. Moving on to the next
group or use case, we have here three functions to extract specific
substring from a text. We have left, right, and made. The next use case is to
search for specific patterns. Here we have five functions, Start with width, contains,
find, and find in. Then we have another
use case for the string functions to combine and split data inside Tableau. Here we have the concat operator and as well split function. The last use case is to replace specific substring,
another substring. So here we have the
function replaced. As you can see, we have a lot of string functions and
tools to manipulate, transform, clean up the
text values in table. Now we're going to start
with the first use case about the string functions. How to clean up our data
and bring our text to standard case using the two
functions, lower and er. But as usual, first
we have to understand the concept before we
start practicing in table. Let's go. All right, now let's go and
check the following data quality issue in our view. If you check the
dimension products over here we have three
values for the word. We have keyboard three times in the view,
which is really wrong. And that's because
data quality from the source system where we get the data from is simply low. This happens if you have a
lot of people working in the peak projects and you
have a lot of products. So they may enter like different names for
the same products. Here we have a case issue
in the product name. And what I usually
do in my projects, I go and contact the
source systems and tell them about the data quality
issues that they have. But sometimes it may take a
long time until they fix it. Individualization, we can go and fix and clean
up those stuff. In Tableau, we have a lot
of tools and functions to manipulate and clean
up the dimensions. For example, we can
use the upper or the lower functions in order to bring standards
to the values. If you go and use the lower, we have the following results. We can have in this example
only three products in the visualizations
and although three values going
to be aggregated for the quantity in only one row,
which is really correct. Now if you compare the first
view with the second view, you can see that
we have improved the data quality
indivisualizationsow let's go and understand how
those two functions works. Now let's have the
following example about the customer's name. The names could be
written like this, the first character
of the first name and the last name
is capitalized, or everything as an upper
case or the opposite. Where we have everything
in lower case, you can see we can
write the customer's name in different cases. Now in Tableau, we have
to bring those names in. Standards, we have
two ways to do that. Either we bring everything
to lower case or case. Now, if you decided to go with the upper case for the customer's
name, what can happen? The first customer can be converted completely
to upper case. The second customer is
already an upper case. Nothing can happen, it's
going to stay the same. The third one, it is low case, so it can be converted
to upper case. But now, if you want to go with the lower name for
the customers, this is what can happen. The first one, the
first customer can be converted to a lower case. The second one as well can be converted from upper to lower. The third one,
nothing can happen because it's already
a lower case. As you can see with
this function, we are forcing the names to
be either upper or lower. So we bring standards
to the visualizations. Now we're going
to go and compare those two functions together. We start with the upper.
It's going to convert the characters two upper case. The syntax in Tableau
going to be the following. It starts with the
keyword upper. It accept only one field, the string, The output
can be as well string. For example, if we
take upper Maria, the first character
is capitalized, the output can be string
Maria in upper case. Now let's go to the lower. It's going to be
exactly the opposite. So it's going to convert the
characters to lower case. The syntax can be similar to, here we have lower
than one field, the String The output
can be as well String. The example here is lower. Maria, Maria can be in the
output as a lower case. Those two functions are
simple and easy to use, but still they are
very important. I tend to use them a lot in my projects to
clean up the data. Now let's go back in
Tableau and Start. All right, for those
two functions, I have prepared an
extra file with the low data quality
in the product names. In order to connect this file, we have to create
a new data source. Let's go to the data
source page over here. And then we're going to go
and create a new data source. Then we're going to
go to the text file. You can find it inside
the small folder. We have here a CSV file
called products low quality. Let's go and connect it. It's only one table, and if you check the data
grid over here, you can see we have
problems in the product. You can see we have here
keyboard in upper case. Keyboard in lower case or with the first
Carter capitalized. So now let's go back
to our sheet and start checking the data
as well from there. Now let's go to the database, make sure we are selecting
the new data source. We have here a product one. Here we have the case issue, so let's bring it in the
view and check the values. As you can see, we can
find like five products, but in reality we have
only three right here. We have the keyboard three
times, monitor and mouse. We should have only three
keyboard, monitor and mouse. We have data quality issue
in the product names. Tableau is case sensitive
so it can present the data exactly as it is
from the source system. Let's take the quantity
and put it in the columns. And as you can see,
those three values will not be aggregated together. Since Tableau think those
are three different, let's show the values
here in the labels. Let's take it to
the color as well. So now we're going
to go and clean up the data using the
lower function. In order to do that, we have to create a new calculated field. Let's go to the Data
Pain over here. Right click on the empty space,
Create Calculated Field. We're going to call
it Products Lower. It's start with the
keyword lower and it accepts only one
value, the string. So we're going to have the
products one and that's it. So as you can see,
the calculation is valued and the output
going to be a string, the product. Let's
go and hit, okay. Now if we check the data
pain, we have here, our new dimension,
the calculated field. Let's bring it to the view and the rows to start
comparing the values. The first one, as you can
see it is an upper case. The output going to be a
lower case of the keyboard. The next one is already lower case, nothing
going to change. The third one is
completely upper case from the original data, but the output is lower case. As you can see, we
have all the names here in a lower case. Now if you go and remove
the product one over here, you can see we can end up
having only three values. Only three products
which is correct. With that, we have
cleaned up the data using the lower case. Now let's go and
clean up the data. This time using the upper
function, we can do the same. We're going to go and create
a new calculated field. Let's call it products upper. We're going to use the
function upper over here. And it accepts only one field, our products, products one. And that's it, the
calculation is valid. Let's click okay. Now if
you check the data bin, we have new calculated
field, new dimension. Let's bring it to the view and start comparing the values. I can bring as well
the original field, the first one is capitalized, as you can see, the output
can be an upper case. The second one is
completely lower case as well, completely upper case. The third one, nothing
going to change. As you can see all the
values now in upper case, now I'm going to go and remove the others to see
the final results. As you can see, we have
only three products and the visualization which
is really correct. And with that, we have fixed
the data quality using. All right, so now
you might ask me, should I use a lower case
or upper case in my views? Well, if you're asking an IT guy like me, I'm going
to answer like this. It depends, it depends on the fields that you are
using in the views. Let's have the following
example. Here we have two views. The left one with the lower
case and the products name. And the second one is
with the upper case. If you take a look now to those two views,
what do you think? It is easier to read? If you have a normal text or a long text like
the product's name, the customer's name, and so on. It's always better
to use a lower case. The lower case are easier to read compared
to the upper case. The upper case is going to
take as well more space. It's more aggressive and
it's really hard to read. So for the scenario
I would go and recommend you to
use the lower case. In modern design they tend to use lower case
since it's provide more slick and
minimalist look in the website and in the look and feeling for the
visualizations. So the lower case is easier
to read. It's more modern. If you compare it
to the upper case, it's hard to read and it's
like someone is shouting. Let's take now another example. We have here an aggregation
for the country abbreviation. So here we have it
as a lower case and as well as the upper case. This time if you
compare them together, you can see that maybe it's more better to use
the upper case. And that's because
since it's very short, the abbreviations has maximum,
maybe three characters. It's really hard to see
Ind visualizations. They are really small. If we have it like
a big characters, it's easier to read
with the abbreviations. I always tend to
use the upper case, The abbreviations if they
are written in upper case, they can bring
standards and they can avoid misinterpretations
of the data. If you look to the right
side of over here, you can understand immediately. Okay, here we are
talking about countries. But if you are on the left side, you might get confused. For example, are we talking
about USA or the word us? The same goes for Italy. Is it like it that we
use it in sentences in the pronoun or is it like the
abbreviation of Italy here? If you write it in lower case, you might introduce some
misunderstanding and mis for the abbreviations. I always tend to use upper case. It's more clear and easy
to read for short names. That's why the answer
that comes from the, IT, it depends, it depends on the use case, the
requirements, and so on. So sometimes we go
with the lower, sometimes we go with the upper. But 90% I go with the lower
case for the names and so on, but only for the abbreviations.
I go with the upper. With that, you
have at least some orientations in
your visualization. All right, so that's
all about how to clean up the data by bringing our text to standard case using the two functions,
lower and upper. Next you can start talking
about the three functions, left trim, right rim, and trim.
134. Tableau | Remove Spaces: LTRIM, RTRIM, TRIM: All right, so now we're
going to talk about another string functions
in Tableau to clean up our data by removing unwanted spaces using
the three functions, left rim, right rim, and trim. And of course, as usual, we have to understand first the concept behind them and then we got to practice in Tableau.
So let's go. All right, so now we have the following scenario,
where we have, again, a bad data
quality in our view. If you check the products, we can see that we have
four times the keyboard. So what is going on? We have here no case issue, like all of them are capitalized
in the first character, so there is no lower
case, upper case. Everything is fine.
Why Tablo didn't aggregate all those values
in one row, in one product? Because here we have
only three products. So what is going on
here? What happened? Well, we have the dirty
spaces in the product name. In the keyboard, there
are like unwanted spaces. It's really hard to see
individual. You can see that. Like everything
looks fine, right? But there's spaces inside the keyboard and we
have to remove it. Now, in order to clean up the data and remove
those dirty spaces, we can use one of the
three functions left, right, trim, or trim. And if you apply those
functions on the product name, we're going to get
the result like this. Only three products and
everything will be fine. Let's understand how
those functions works. Let's have the following
simple examples. Let's say that we have
the word monitor, but on the left side
we have a white space. In order to remove it, we can
use the Tableau function. Lift, trim, lift, trim, Gna, remove any unwanted spaces from the left side of the word. Now we might have the opposite situation where we
have the monitor, but on the right side
there is a white space. In order to remove those spaces, we can use the
function in Tableau. Right trim, right trim. Going to remove any spaces from the right side of the word. Moving on to the third scenario, we have the same word monitor, but this time on the left. And on the right there
are white spaces. In order to remove those spaces, either we can use both of the functions lift
trim and right rim, or we can use the
third function, trim, if you use the trim
function in Tableau. For this scenario, it's going to remove all the white spaces from the left side and as well all the white spaces
from the right side. All right, so now
we're going to go quickly compare those
three functions. The lift trim going to
remove any leading spaces. The right trim can remove
any trailing spaces, and the trim can
remove both of them. The leading and trailing spaces and the syntaxes in
Tableau are really simple. So for example, we have
here the left trim keyword. Then it accepts only
one string field, the output going to
be a string value. For example, let's say
we want to lift trim, this value, we have
narea on the left side, we have a white space. And as well on the right side, if you use a lift trim, it can remove only
the leading spaces. So it can just remove the space from the lift
and going to leave the space that we have on
the right because it's only lift trimming. Let's
go to the next one. It's exactly the opposite, but the syntax is
almost the same. So we have a right to trim it, except the field
string, the output going to be as well
a string value. If we stay with
the same example, it's going to remove
only the trailing space. The space on the left side
going to stay in this example. Now let's move to the last one. I think you already got it. We're going to use
only the trim here. Not a lift or right.
So both of them. And it except as
well a string field. The output going to
be a string value. And the example going
to be the following. Maria with the left and right
spaces, what can happen? We're going to remove
the lift space and as well the right space. Those functions are really
easy to use and very important to improve your data quality
indivisualizations. Let's go back to Tableau
and start practicing. Okay, first, make sure to select the right data source so we can stay with the
products low quality since I prepared the examples. And now we're going to
go with the product two, just drag and rub it
here in the view. As you can see, we have now four products
for the keyboard. Now it's really hard to see
where are those white spaces. For the first two,
you can see they are little bit
shifted to the right, but for the second
two keyboards, we are not sure whether
they are like on the right side a
white space or not. The situation can
be really bad if we switch to different
visualizations. Let's take the quantity and
now in the bar diagram, it's almost impossible to see whether there are like
any white spaces. If I'm facing this
situation in my projects, I go first and start counting how many characters do
I have in each product. I calculate the
length of each word. In order to do that, we can create a new calculated field. Let's go and create a new one, and we're going to call
it products length. The keyword for the
arts to calculate the links is LEN. That sets. Then it accepts only one field, string field, and the output
going to be a number. Our field going to be the
product to make sure to select the correct one and that sets the
calculation is valid. Let's click Okay, since the output going to
be a number Tableau, going to go and create
a continuous measure. So I'm just going to remove
the quantity from the view, and let's bring our new
calculated field to the view. The link of the
first one has nine, so this means we have
only one white space. The second one has
two white spaces. The third one is correct. The first one is as well
has one white space. With the link function,
we can easily detect whether there are dirty
spaces in our words. Now in order to remove and
clean up those problems, we're going to use
the trim functions. Let's start with the lift trim and we're going to go and create a new calculated field.
Let's go and do that. We're going to call it
products left trim. And we're going to start
with the syntax left trim, and it accepts only
one string field. Going to be the product. To make sure to select the correct one, that calculation is valid.
Let's go and hit okay. Now we notice that table created a new dimension because
the output is a string. Let's go and put it
here in the view. Now what can happen to the
values inside the products? All the spaces
from the left side going to be removed or trimmed. But again, here, it's
really hard to see from the view whether
everything is fine. So we're going to go again and calculate the length
of the new field. Let's go and change
the calculations inside our calculated field. Instead of having
the Broadct two, we can remove it and
insert the new dimension. Let's click Okay. All right. So now let's check the result. As you can see, we
have some values fixed to the first one. We
have it as eight. The second one we
still have space. The third one is anyway correct. The third one is
as well incorrect. As you can see, the situation
is now a little bit better. But we still have spaces. That means we have spaces
on the right side. In order to fix
this, we're going to go and trim from
the right side. Let's go back to our
calculations, the left trim. Let's edit it and
add the right trim. So we're going to go over here, we're going to have
nested calculations, right trim, and we want the
results from the left trim. Let's go and hit.
Okay, But maybe I'm going to change
the name to Trim. Let's hit. Okay,
so what can happen to the values inside
the products? We are trimming everything
from the left and as well from the
right as you can see. Now the length is
as well, correct. All those values has
the links of eight. In order to test this as well, we're going to remove
the product two from the view we have here,
only three values. Of course the link doesn't make any sense here
because we are summarizing the links of all the products
inside the orders. Instead of having
it as a measure, maybe we can convert
it to dimensions, do not have any calculations. I'm just going to
remove it from here and just add the product length. As you can see,
everything is fine. Now, of course for
this scenario, we have an easier solution. We can just use a
trim instead of using left and right
trim in one calculation. Let's go and do that.
We're going to go back to our calculation and edit it. So we're just going
to remove everything. We're going to use
the keyword trim and then it accept
only one field, going to be the product two, and as you can see,
the calculation is valid. Let's click Okay. As you can see, nothing
going to change in the view. We're going to get
exactly the same results. With that, we have cleaned
up the values inside the products by removing any
dirty or unwanted spaces. All right, I want to
show you one more method on how to detect
whether there is like bad equality in your data
by having unwanted spaces. That's specially if you
have a big data source. If you have a lot of values, it's really hard to detect those stuff if you are
using the link function. I'm going to show you
now how I usually do it if I have a source, what I usually do
if I have suspicion about one field where I
think the users are like manually entering
the values that I go and count the distinct
value inside this field. Now let me show you
how I usually do it. Let's go and create a
new calculated fields, and we're going to
call it Products count D. The syntax for
that is going to be count. Then the word D, we are counting the distinct value
inside our products. The field is going
to be product two. The output for that is
going to be a number. The calculation is valid.
Let's go and hit, okay. As you can see on
the left side we have a new continuous measure. It's going to count how
many distinct values we have inside products. Let's see the results.
I'm just going to go and remove everything
from the view. I'm going to take the count
and put it on the text. Now the results
going to say I have six different products
inside my data source, but I have suspicions about it. Now what I'm going to
do, I'm going to go and start trimming the values inside the products and my expectation going
to be the following. If the number is going
to stay the same, then we don't have any spaces, But if the number is
going to go smaller, then we have unwanted
spaces inside the products. Let's start testing that.
We're going to go to our calculation and
start adding our trims. We start always with the
left trim or right trim. Why? We don't go
immediately to the trim Because if you are trimming everything from
the left and the right, this can, has a
bad performance in Tableau because it
needs resources. If you are only lift trimming
or only right trimming, it's going to be easier
for Tableau to do it. But if you always go
immediately to the trim, you might have bad performance. That's why I always start
with the lift trim. So let's go to the lift
trim and check the results. So I'm just going to add it
to the product over here. With that, we are first lift
trimming the product two, then we are counting how
many distinct values we're going to see
inside this database. The calculation is
valid, Let's set. Okay. All right, so now
we moved 6-4 products. This is alerting for me, that means there is
like leading spaces. Now the next step, what I
usually do is to go and test whether any right spaces on
the right side for that. Either I'm going
to add a right to trim or I'm just simply
going to use the trim. Now if we add the right trim and the trim and the number
going to stay the same, Four, that means we have only problem with
the lift spaces. But if the number
going to go smaller, that means we have as
well right spaces. Now what we can do, we're
going to go again to our measure and edit
the calculation. And instead of having left trim, I'm just going to have now
a trim to test as well, the right spaces.
Let's go and hit. Okay. Now as you can see we went 4-3 That means we have
as well right spaces, not only left but
as well, right. So the total number of
products went 6-4 to three. This is how I usually
do it to decide whether I'm going to use only
lift trim or right rim, or both of them instead
of using immediate trim. I saw a lot of projects, and a lot of developers tend
to overreact with this. If they see like a string value, they go immediately and trim it just in order to
have a correct result. Add a Tableau visualization. But believe if you
do this always, you're going to have
bad reaction in Tableau and you can
have bad performance. Take little time investigating whether it's really
necessary or not. All right, so that's
all about how to clean up our data by removing unwanted spaces
using the three functions, lift trim, right, rim, and trim. Next we're going to talk
about another group, the lift, right, and mid.
135. Tableau | Extract Substring: LEFT, RIGHT, MID: Now we're going to
cover another group of string functions in Tableau to extract specific substring from the text using the
three functions left, right, and mid as usual. Let's understand the concept that we can practice in Tableau. Let's go all right everyone. So in real scenarios
and real life projects, the data that comes from the
source systems usually are way more complicated than the data that you
can find in samples, tutorials, courses,
and so on because the processes and real projects
are way more complicated. The example that we can see here could be the Broaduct name
inside your projects. Here you can see
we have a lot of informations in only one field. For example, we have the Canon, this could be the product name. The next one we have
the product ID. And the third one is
the product code. All those informations, we might find it underneath
the product name. In only one field
individualization, we might be interested
in only one piece of information, not
the whole thing. We could be interested in on
the Canon, the product name. Or we need only the ID 789. Or we want only the code
to be individualizations. We need Tableau, such
a function or tools in order to extract those
pieces of information. And split the one field
to three fields in Tab. There are a lot of functions and ways in order to
achieve this goal. One of them is to use the
functions left, right, and mid in order to cut this
field into multiple fields. We're going to start
now with the first one. Let's understand the lift. The first thing to
understand is that each character in our string
has a position number. For example, we have the C, it has the position number one, the 23, and so on until we
reach the last character, five, it has the position 14. We are counting from the left
until we go to the right. Now in this example, we are interested only on
the product names, so we're going to
focus on this one. And as you can see, it ends
with the position five. The syntax in
Tableau in order to do the lift is the following. It starts with the left. Then it needs two arguments. The first one is the field
itself, the string itself. Then the numbers of characters that we want to keep the output. The result going to
be a string value. For example, we're
going to take left, then our value and the number of characters
going to be five. We are keeping five characters
from the left side. Let's see how this
is going to work. We're going to
start counting from the left and we
move to the right. The starting character is, we start counting 12345. This is exactly the number of characters and we
make a cut here. Anything after the five or
after n going to be removed. And we keep here only
five characters. We can have the output of Canon. In this example, we are cutting all the values after the character with the
position number five. All right, so this
is how the lift function works in Tableau. Let's move on to
the next function. It's exactly the opposite. We're going to have
the right function. Let's say that we
are not anymore interested in the product name. We would like to have and
extract the product code, the last four characters
of our string. Now if you are
considering to use the right function,
what can happen? The position number
of the characters can be exactly the opposite. We're going to
start counting from the right side as we
are moving to the left. The first character going
to be the character five. The second one, R, the third
and the last character, number 14, going to be
the C. Now we want to focus on the product code and we're going to use
the right function. The syntax for the
right function is very similar to the lift, it's start with
the right keyword, then we need our field,
the string field. Then the number
of characters the outward going to be as
well a string value. This time going to be
the example like this. It's going to have
right our string. Then the number of
characters that we want to keep from the
right side is for. Let's see how this can work. The right function is going
to start counting from the right side and we
move on to the lift. We start counting
from here, 1234. And that's it. Here we make cut. All the characters after the position number
four will be ignored, will not be part of the results. At the end, you're going to get only four characters from
the right side. E R five. This is how the right
functions works in Tableau. We start counting from the right side and
we keep only like, for example here,
four characters. All right, so now we're going
to move to the third one. We have the mid function. All right, so now
we want to extract the last piece of information
that we have in our string, the product ID, the
one in the middle. So we are not interested
in the first part of the product name or the
last part of the code. We want to get exactly this
information in the middle. If you are using mid, we're going to count from
left to right, exactly like the left function. The first character
going to be the C, the last character
going to be the five. The syntax in Tableau is slightly different
as left or right, so we start with mid. Then we have three arguments. The first one, as usual, the string value that
we want to manipulate. The next one here is new. We can define the start point, where we can start counting how many characters
were going to leave. Then we have the length here, it's like the number
of characters, but this time it is optional. If you leave it, we're going to consider everything
after the start point. Or if you specify
it, we're going to have exactly the same number of characters that you define the output going to be here
as well, String value. Let's take here an example. We can have mid, then our value. We want to start
counting from seven and we want to keep only three
characters in the output. Now let's see how this can work. The start position,
to count the number, is the position number seven. We're going to start from this value and we're
going to count three characters, 123 and cut. Now what we are doing, we
are cutting two things, the starting position
and the position. That means all the characters before the starting
point will be ignored, will not be at the results, as will all the characters after the final one at the
cut will be ignored, the output going to be 789. With that, we extracted information in the
middle of our string. This is how the
mid function work, as you can see with
those three functions. With those three
tools in Tableau, we can cut anything in our
string and generate new data. Let's go and Tableau
and start practicing. There are many use cases
for those three functions. For example, let's start
working with the URL. The URL has usually a structure
and we want to extract part of the information inside
URL in our data sources, we have a URL in the images. If you go to the
small data source, go to the products, and here
we have the product image. Let's drag and draw it on the rows and check
the structure. The standard URL usually
starts with the protocol. Then we have a domain, and then at the end we have like
a file or something. Our files here are all images like we practice in
the image droow. The first task is to extract only the protocols from our URL. Now, tools are from
the left side. I think you know already
that we want to use the lift function
so we can go and count how many characters
we want to leave. We need five characters. Let's go and create a
new calculated field. Because we need a new field, we're going to call it URL and then we're going
to have that protocol. It's starts like this, the left and then it
needs two arguments. The data that we need
is broad act image, we have it over here and we
want to cut five characters. We can specify here five. As you can see the
calculation is valid. Let's go and try that out. We're going to go and
hit Ok. And as you can see on the left side
we have our new dimension, our new calculated field. Let's go and bring
it to the view. Drag and drop it on
the road beside it. And as you can see now
we've got a new field in our data source where we have the protocol information
from our URL. So everything is working fine, and this is how we work
with the left function. Let's go to the next use
case where we want to extract the file
extensions in our URL. We want to get this
part at the end from the URL as we are speaking
about the right side. What we're going to do
now, we're going to use the right function here. We need to extract
around three characters. Let's go and create
the calculated field. So we're going to go
and create a new one. We're going to call it
URL file extension. It's start with the
keyword, right? And then it needs as
well two arguments string our field going
to be the product image. And how many characters we want. We want three, Come on three. With that, you can see the
calculated field is valid. Let's go and hit
Ok. And as usual, we have a new calculated field, a new dimension in
our data source. Just to deal with
the file extensions, let's check the values to
see if everything is fine. And as you can see,
we are getting all the file extensions
from the URL. As you can see,
it's really simple. And we are with that,
generating new informations and new fields that we
could use in our analysis. And they are based
on the original data that we get from
the data sources. All right, so now let's move to the next task where
we want to get the URL's starting from
the domain name without having the protocols. We want to keep anything after the double slashes
in the string. This time we're going to
use the table function de. Let's go and create a
new calculated field. We're going to call
it broad domain. Here we can start
with a keyword mid. It takes three arguments. The first one, as usual, can be the broad act image. Then when do we start cutting? Here we have to
specify the number, 12345789, we start
cutting from nine. The last one is
optional. I'm just going to leave
everything afterward. We will not cut anything
from the right side. That's it. The calculation
is valid, it's okay. As usual, we get
a new dimension, new calculated field, and our to be used in the analyzers. Let's go and grab it and put it in the rows to check the values. As you can see, we start from the domain name and the
protocol is cutted. The whole value going
to be the rest. Now next we have the
following task for you. All right, so the
task is to extract the last four digits of the phone numbers
from the customers. To go to the addresses and
extract only the street name. So we can remove the code
and the word street. Now you can go and
pass the video in order to complete the task. And once you are done,
you can resume it all. I think it's really easy. Let's go to the
small data source. We're going to go
to the customers and grab the phone to the view. Now we want to extract the last four characters
we are speaking about. The right side, right, we're going to use the right function. Let's go and create a
new calculated field. We're going to call
it phone code. And we can use the
right function to cut from the left.
From the right. Sorry, the string
value is phone. We want to cut four digits, so we're going to have the number of
characters going to be. Now the calculation is valid. Let's it okay, and take it to
the results as you can see. With that, it's really easy. We got the last four digits
from the phone number. All right, so now
we're going to go and solve the next task. We need only the street
names from the address. As you can see
over here, we have the code and then
the word street. And then we have
the street name. We want only this
piece of information. Since we want to start
cutting over here, we're going to use
the mid function to define the starting
point of the cut. Let's go and create a
new calculated fields. We're going to call
it address stretch, so we're going to use
the function mid. The first value can
be the field address, then the starting
point can be nine. The rest, we're going
to leave it as it is. So that's it. Let's apply
and check the values. Drag and drop in the
view as you can see. With that, we have
only the streets from the address. We cut it. The first part, you solve the task using like
eight instead of nine. That's because you forgot
to count the white space. If I just remove
it and use eight, I might get exactly
the same results. But we have white spaces,
which is not really good. The space counts,
it should be nine. That says this is really simple. This is how you can extract
information in Tableau. All right, that's all
about this use case. How to extract specific
substring from the text using the three
functions left, right, and mid next we can start
talking about bunch of functions on how to search for specific
patterns in Tableau.
136. Tableau | Search: STARTSWITH, ENDSWITH, CONTAIN, FIND, FINDNTH: Guys, so now we're going to
move to the next use case, where we're going to
learn how to search for specific patterns in our text
using calculated fields. And here we have five
functions we have, start with, end with contains, find, and find th as usual. First we have to understand
the concept behind them. Then we're going to go
and practice in Tableau. Let's go. All right everyone. The search functions in Tableau gonna be split into two groups. The first we're going to return whether the substring
exists or not. In our text here we
have three functions. We have the start with,
end with, and contains. The output of those three
functions is going to be always either true or false. We have a pullion, for example, we have the function contains, we have our string, and we are
searching for dashes here. The output is going to
be either true or false, in this example, is going to be true since we have
it here twice. And then we have a second
group of functions where it can return the
position of the string. Here we have two functions, find and find in the output going to be the position number. So we're going to get numbers
out of those two functions. For example, if we take
the function find for the same string and we are
searching for the dash here, we're going to get
the output of six. So we are not getting
true or false, we are getting the
position of the substring. And example can
be the first one. It has the position number six. As you can see, both
of them could be used to search for specific
thing in our text, but they answer
different questions. The first group can
answer the question whether the substring
exists in my text, yes or no, true or false. But the second group can answer my question where I
do find my substring. So here we're going to get the position number of the search. Now let's go and focus on the first groups of functions
we're going to focus on. Start with, with, and contains. Okay, Now we're going to
start with the first one. Start with, let's say that we have the following
text, Monitor, LG, four k. The syntax in
table going to be very simple. It's start with the
keyword start with, and it accepts two arguments. The first one going to
be the string field. It is the text where we
want to search inside it. The second one, we'll have the substring here we can specify what we
are searching for. The output as we learned is going to be either
true or false. It is epuli. Let's
take an example. We have start with our text and we are searching for
the word monitor. Let's see how this can
work. It's really easy. We start searching from the left and we
move to the right. The start position
for the search is going to be character. Now Tableau can go and
start matching the monitor here in our text starting from M. And
as you can see here, the first part of our
text is matching with the substring that you are
searching for our text. Start with Monitor,
which is correct. That's why Tableau can return. It's true. Okay. Now
let's take another one. Here we are asking, does our text start with
the substring LG? Of course, if you're
checking our word, if you start searching from
the left to the right, our text does not start with LG. Tableau will not
find a match and it's going to answer
with a false. That's it. It's simply right. We are just asking a question. So we ask Tableau something and Tableau can answer
with either yes or no. Okay, so now let's move
to the next function. We have the ends with, it's exactly the opposite. All right, we're going to
work with the same example. And the syntax in Tableau
is very similar. Here. It starts with the ends with here it accepts to
argument as well, the string field where we're
going to search inside it. And the substring here, we can specify what
we are searching for. The output going to be
as well, true and false. So let's start with
the first example. We are asking here, does our
text ends with four K here, Tableau can start searching from the right side,
moving to the left. Now here does our text
ends with four K. So yes, the last two characters
is four K. That's why Tableau can answer
was yes, that's it. The output, the
result can be true. Let's ask another question. Does our text ends with LG? Well, if you check
the text over here, it does not end with LG. Lg is in the middle, so the last two
characters is not LG. That's why Tableau
can answer was false. So the answer is no. So as you
can see, it's really easy. We are just asking questions and Tableau is answering
with either yes or no. Let's move to the next one. We have the contains. Okay, so now we are working
with the same example, and the syntax is very
similar to the other two. Here, it starts
with the contains, and it accepts two things. The first one we need to specify the text that you
are searching inside it, and the next one we're going to specify what you
are searching for. The output going to
be as well pullin true or false. Yes or no. Okay, Now let's ask Tableau
the following question. Does our contain
the word monitor? What table going to do is that it's going to
search everywhere. It will not search at
the start or at the end. It's going to search everywhere. And if the word is
going to be found anywhere inside our text table, Going to answer
with yes was true. Does our text contain
the word monitor? As you can see, it's true. Table can return yes. Now let's ask another question. Does our text
contains the word LG? Well, if you are
searching over here, you can find it in the middle. So that's why Table
can answer as well. Withdraw. Yes, our text
contains the word LG. Okay. Let's move on and ask
the following question. Does our text contain
the substring four? If you check the text over here, we have the four, we have the G, but they are not together. That's why table can answer. No, we don't have the
word four in our text. Now as you can see, the function contains does not
have any restriction. It's going to search everywhere. It's not like start
with and end with. The substring should not
be at the start and at the end if the substring
exists anywhere. Yes it's true. If
not, then it's false. So that says this is about
the three functions. Let's go now in Tableau
and start practicing. All right guys, So
now you might ask me, what are the use cases for
those three functions? Well, I use them
in two scenarios. The first use case when
I'm exploring new data. The second use case is when I'm offering new filters
to the users. Okay, so now let's
start with the first one, exploring the data. This is specially
useful if you are new to a project or if you
have a new data source. So the first step is
usually is to explore the data and layer the
content of the data source. So if you are in this situation, you might have a lot of
questions about the data. So you have those
three functions, those three tools in order to explore the new
data that you have. Okay, then let's go and explore the products inside
our big data source. We have there a lot of
products and I would like to understand the content
of my data source. So let's take the product
name to the rows. And as you can see
Tableau saying, okay, there is like
a lot of members. I recommend to have only 1,000 but I would like
to see everything. So I'm going to say add
all members to the view. And now as you can
see, we have a lot of products inside our data source. And I would like to understand
the scope of my projects. So what are the content
of those products? I would like to know
whether we have Apple products inside
our data source. So we're going to go and create a new calculated
field to answer that. So we're going to say products starts with Apple that sets. We're going to use
the function starts with start with it. Need two arguments. The first one going
to be the text where we're going to
search inside it. It is our product name. We are searching inside
the product name. Now what we are searching
for is the word apple. I'm going to write it like
this, Everything is fine. You can see the calculation
is valid. Let's click Okay. As you can see on the left side, we have a dimension with the data type pullion because we have yes or no true and false. Let's take it to the rows
and check the results. You can see over here we
have a lot of falses. I'm going to go and sort it
in order to see the true. We can see over here we
have four products where the product name starts
with the others. Does not start with
Apple as you can see. Now we have a little bit more
insights about our data. Let's go and ask the
follow up question. Does the product name contains
anywhere the word Apple? Not only at the start
or at the end anywhere. In order to ask the question, we're going to go and create
another calculated field. We're going to call it
products contains Apple. We're going to use the function contains it. Need two arguments. The string that we
are searching inside, it's going to be
our product name. What we are searching
for is Apple. That's it, and the calculation
is valid. Let's set. Okay. Again, here we have a
dimension called products. The data type true and false. So pull, let's track
and draw it here. But first I'm going to go
and make it a little bit bigger to see the
header of the field. As you can see, the
first one is contains, the second one is start with, let's sort it by contains. As you can see, we have
around seven products where the product name contains the word apple. Now
let's check the result. As you can see, the first one, we have it over here,
the word apple. The second one is over here, and the third as well over here. And the rest, those
word products, they start all with
the word apple. As you can see, that
contains functions. We're going to get more results
than that. Starts with. All right, so as you can
see, we are learning more about the products
inside our data source. We have seven products
from the company Apple. Let's have the
follow up question, does the products names
ends with the word Apple? In order to do that, we
can create and again, a new calculated field, let's call it products,
ends with Apple. So we're going to use this time. The function ends with, again, here we have the product name and we are searching
for the products. Thus, the products ends
with the word Apple. The calculation is valid. Again, we have here a pullin. Let's drag and drop it in the
view to check the results. Now let's go and
check the results. I'm just going to make it
a little bit wider to see. Okay, this is the ends
with, let's go and sort it. As I'm sorting, we
don't have any true, all the values are false. And that means we don't
have any products. It ends with the word apple. We do we understand
that the word apple exists only at the start of the product name
or in the middle? As you can see, those
three functions are really great to
understand our data. Now let's go and ask
the follow up question. Does the product name contains
the word Samsung anywhere? Here we are, searching
for the products from the company Samsung. In order to do that, I
think you already know it. We're going to go and create
a new calculated field. We're going to call it
products contains Samsung. We're going to use the function
contains and we're going to search inside the
field name, Broduct name. This time we are searching
for the word Samsung. As you can see, the
calculation is valid. Let's go and hit, let's
bring it to the view. Now I'm going to just make
it a little bit bigger to see what we're talking about here. It's
about the Samsung. Let's go and sort the results. Wow, we can see that
we have a lot of products from the
company Samsung. So we have more products from Samsung than Apple
in our data source. Let's check the results again. So here we have it over here,
Samsung. Samsung over here. Then we have a lot of
products where it starts with the word Samsung again
here in the middle, but it never end up
with the Samsung words. Okay guys, there's one more
function that I usually use inside the calculations if I'm searching or
exploring the data. And that is the case functions, the upper and the lower case
that we learned before. That is because Tableau is
case sensitive in the search. We have to pay attention how we are rating the search term. In order to now
overcome this problem, we're going to use
the case functions. Let me show you an example. Now we can ask the question, does the product name contains
anywhere the word plaque? Let's go and create a
new calculated field. As usual, we're going to
call it products plaque. And this time we're
going to use all that contains the string, the product name and we are searching for
the word plaque. That's it. Let's set. Okay, we have it as a new dimension. Let's check the result. As usual, I'm just
going to make it a little bit wider
to see the results. Now we have a lot of falses
and we have a lot true. There is a lot of products that has the word as
you can see over here. We have here, we have
over here as well, the word black at
the end and so on. So there's a lot of products
with the word black. The case here is the capitalized
of only the character B. Let's go and change the
case in the search term. So we're going to go and eat it. The calculations now instead of the first character
capitalized, you're going to
have it as small, everything in the lower case. Let's go and hit Apply. Now as you can see
in the results, we have only one product
with the word black. As lower case Tableau is very sensitive with the cases
inside the search term. If we switch everything,
for example, to upper case black,
let's search. As you can see, all the products that we have is now false. We don't have any products that contains the word upper case. Tableau is very sensitive about the cases inside
your search term. Now to fix this,
instead of going and changing each time the
case of the search term, lower case, upper case
capitalized, and so on. We go to the product
name and we force it to be uppercase or lowercase. Using the lower or upper, we're going to go
over here and add, for example, the lower. You can use upper if you want. We're going to have
the same results. With that, we are first forcing the product
name to be a lower, and then we can search
for the word black. With that, I'm covering all the scenarios
inside my data source. Let's go and hit
Okay, with this, I will get all the products
that contains the word black. Doesn't care whether it is
lower case or upper case. We're going to get everything. So with that, I'm sure
that the string is containing the word plaque and we are not missing anything. So that's why I include the upper and lower case inside the calculations
before I start searching. So that's it for the facie case. This is how I usually use
those three functions in order to explore and learn the
content of my new data source. Let's go now to the
second use case, where we're going to use
those three functions in order to offer new
filters to the users. So for example, let's
create a filter for the companies inside
the products name. So let's go and create
a new calculated field. We're going to
call it Companies. And this time going
to be a little bit more complicated
than before, but we're going to
do it step by step. So we are searching first
for the company Apple. So we're going to have contains product name and the search term going to
Apple lower caste. But we have as
well to lower case the product name right lower. And we're going to
have it like this. This is the first one.
I'm just going to copy it and paste for
the next company, we're going to have Samsung, and then we're going
to have Microsoft. We are searching for those three companies, and that's sets. So now we're going to have
those three companies. But as you know, the output of the containers is always
like true and false. But I would like
to have a value in my filter called Samsung,
Apple and Microsoft. In order to do that,
we're going to use the logical operations
F L statements. Don't worry about it. We can have a dedicated tutorial for that later, but we
have to use it now. Now, just following,
we're going to use it to evaluate those conditions. It starts with for the first one contains
the product name Apple. What can happen then? I would like to see
the value Apple. Then if it's not true, then go to the next one, L F. Then we're going to
evaluate this condition, it's true, then it's
going to be Samsung. If it's false, of course we're
going to use another LSF. We're going to
evaluate this one. And then the output,
if it's true, going to be Microsoft. If doesn't fulfill any
of those conditions, we're going to have the L, let's say Unknown. That's it. We're going to end
it. Don't worry again about those logics
we're going to talk about. With that, I'm going
to get values, I'm going to get
those three values instead of true and false. And we are evaluating those conditions. Let's
go and hit, okay. So as you can see now
we have new dimensions. The data type is not
pollen, not true and false. And that's because the output of the calculation now going
to be string values. Let's go and show
it as a filter. And now we can have those
values as you can see, Apple, Microsoft,
Samsung, and Unknown. I'm going to add it as well to the view to see the results. Let's go and grab it over here. Now the users can go and start filtering the data
based on the companies. Let's remove everything
and start with Apple. With that, we're going
to get all the products with the word Apple inside it, or we have Microsoft.
So now we can see. Those products are
from Microsoft. The same goes for Samsung. With that, we are
filtering based on the companies and we use the product name as
basics for that. The Unknown I think is going to be a lot of values Unknown. You can go like step by step adding more companies
to our filters. But now I'll just show
you an example for that. This is exactly the power of the calculated
fields in Tableau. We introduced new information
based on the functions, this is all for this use case. How to create filters based
on those three functions. All right, so now we're
going to focus on the second group of search
functions in Tableau. We have the two
functions find and find. In here we are
answering the question, where do I find my search term? We are searching for the
position number of search term. This time we are
not getting true, un false, we are getting
the position number. Let's understand why
do we need this? All right, now let's
quickly understand the differences between
find and find n. Well, in find we are returning the position number of the first occurrence
in the find nth, we are returning
the position number of specific occurrence. For example, let's say
that we want to search for the position number of the
dash inside this string. The results going
to be six because the first occurrence is going
to be at this position. But on the other hand,
we can use the function find n for the same
text and for the same, we are searching
for the, but we are asking now the position
of the second occurrence. So the first occurrence
is going to be ignored. We're going to get
the position of the second occurrence and
that's going to be ten. This is the main differences
between those two functions. In find, we are searching for the first occurrence always, but in find eh, we can specify which occurrence
we are searching for. Let's go more in details
about the function find. All right, so now we
can have this example. And as you know
that each character in the string has a position. C has deposition number one, and the character five
has deposition number 14. The syntax for find in Tableau
is as well, very simple. It starts with the keyword find, and here we have
three arguments. The last one is optional. String is the te
search inside it. The substring is what we
are searching for here. The start position of the search as you said, it is optional. The outward is going
to be a number. For example, let's say
that we want to know the position of the
dash inside this text. How this works,
it's really easy. It starts from the left side. Always, since we didn't specify anything for
the starting position, it's going to start from
the first character. Tableau can start searching. Okay, In the first
character, we don't find it. The dash, we can find it at
the position number six, the outward at the
position number six. All right, now let's take
another example where we can specify the start position
for the search for Tableau. We're going to have
the same thing again, but we're going
to say this time, start from the position
number seven, okay? So what can happen? We're going to start searching from here. And Tableau going to
start from left to right, so we're going to find it over here at the position number ten. The result going to be at
the output ten instead of six because we start
searching from this position. All right, so that's all
for the function find. Let's move to the next
one, we have to find. And we're going to work with
the same example syntax, going to be a little
bit different. It's start with a keyword
defined the string value, where we're going to
search inside it, we're going to specify
what we are searching for. But this time we're going
to specify the occurrence. Here, we have to tell Tableau which occurrence we
are interested in. Let's take an example. We
have the following question. Find the position number of
the dash inside the string, but we are interested in the second occurrence, how
this is going to work. We're going to start
searching from left to right. As usual here, we cannot specify the start
position of the search. We don't have this
option over here. It can always start
from the first one. As we are searching
from the left to right, we have the first occurrence
of this character. We have it at the
position number six. Output will not be
the position number six because we told Tableau we are interested in the second occurrence,
not the first one. Tableau going to go and keep searching for the
dash in the string, so we're going to find it
at the position number ten. Here is the second occurrence of the dash inside our text. This is exactly what
you are looking for. The output going to be
the position number ten. That says, this is how
this function work. We can search for specific occurrence in the function find. We're going to get always
the first occurrence, but there we can specify
where to start search. Now let's go in Tableau
and start practicing. All right, so now we're going to have the following example. We're going to start with
the small data source. Let's go to the customers. And I would like to
get their first name and as well the phones. So now the task is to extract
the country code from the phone and to put it in extra field so we are interested
in those informations, the plus 33, plus one, plus 49, and so on. So as we before, we can use
the function lift in order to extract the information from the left side in the text. Let's go and create
that. We're going to go and create a new
calculated field, let's call it phone
country codes. And we're going to use
the function lifts. We have to specify the string, so it's going to be the phone. And now the next one, we have to specify the number
of characters that we want to extract and he exactly where
the problem comes. Sometimes it's going to be like three characters and sometimes going to be two characters. Let's go, for
example, with three. Let's set. Okay, we have it
over here. New dimension. Let's just bring it
to the view here, we can find exactly
the issue, right? The first one is fine, the
third one as well, Fine. But for those countries
it's not working. We have the dash inside it, which is not really correct. Now, in order to fix this, we're going to use the
magic of the function find. If you check over
here, we want always the numbers before
the dash is right. We can search for the
position number of the dash. And then we can include it in the left function. Let
me show you what I mean. We're going to go and create
a new calculated field. We're going to call
it phone find dash. So now we're going
to go and find the position number of the dash. As we learned, start with find. We have to specify where
we're going to search. So we are searching in phones, what we are searching
for, right, We're going to have the
dash here, and that's it. We are not interested
in the start position, so we can start from
the first character. That's it. As you can see,
the calculation is valid. Let's set, okay, since the
output going to be a number, we're going to get it at
the continuous measure. Let's drag and rob it over
here and see the results. The position number of dash inside the first phone is four. The second 13, then 443.
Everything is fine. Now the next step, what
we're going to do, we're going to bring those
two calculations, the left, and find in one calculation I'm going to go and copy the
syntax from the phones. Fine, let's just copy
it from here and go back to the first calculation
about the country code. Let's go over here, edit it now. Instead of having the
three as a static, we're going to have
it as a variable using the fine function. Let's just add it over here. Now how Tablo going to
execute this calculation? It's going to start with
the first function, find, it's going to first find the position number of the
dash inside the phones. And then afterwards
we're going to go to the function left outside. We're going to now cut
everything, This position number. All right. Now let's go and check the results at the string. As you can see, we
are almost there. We have the plus 49 dash, plus one dash, plus 33 dash. The dashes are everywhere, and that's because we are cutting everything after
the dash position. That means we are
always one step more than needed in
order to fix it. It's really easy.
We're going to go back to our calculation. Yeah, we are getting here
the position number, which is correct, but we
want to get one step back. In order to do it,
we're going to do minus one to go one step
back. Let's okay. All right, so with this we get exactly what we want, right? Plus 33, plus one plus 49. And with that,
we're going to get more dynamic in
the function left. We are using defined function. With that, we can see how we can bring those functions
together in one calculation in order to
achieve such a great goals. All right, now let's try out the second function that
we have defined, nenthow. Let's say that we want to get the position number of the dash. But in the second occurrence, let's go and create a
new calculated field. We're going to start with
the keyword fined nth. It's needs three arguments. The first one going to
be the text where we can search inside. It's
going to be the phone. Then we are searching
for the dash. And then the third
one we're going to specify which occurrence
we are interested in. We are interested
in the second one. That's it, the calculation
is valid. Let's click Okay. Since the output is number, we're going to get a
new continuous measure. Let's bring it to
the view over here. Now let's check the results
for the first phone. The second occurrence
of the dash is going to be at the position number
eight, which is correct. And as you can see, the
find is number four because the first occurrence at the position number four
for the second one, it's going to be in the number seven which is as well correct. Now, let's go and start
changing those occurrences. Let's go and edit it again. I would like to get now
the third occurrence. So as you can see, we have
a third dash over here. Let's change it to
three and just apply. You can see now we are getting
the position number 12 for the last dash in the phone number
that we are getting. The third occurrence, the
dash inside our text. But now if we go and switch
it to one, what can happen? We're going to get exactly
the same result as find, because find can always
bring the first occurrence. So here we are saying I'm interested in the first
occurrence, all right. Okay, so that's it for those two functions, find and find. They are really useful to
get the position number of specific substring and I usually use them in
another calculation, so they are like supporting
another function. All right, so that we have
learned how to search for specific patterns in our text in Tableau using Tableau
calculations. Next you can start talking
about another group on how to combine and
split the data in Tableau.
137. Tableau | CONCAT & SPLIT: Now we're going to learn
how to combine and split the text in Tableau using
the concateination operator, the plus and the split function. But as usual, let's understand
the concept behind them, then we can practice
in Tableau. Let's go. All right, so now we're
going to talk about the concatenation in
Tableau. It's very simple. We use for that the
plus operator in order to combine multiple
texts into one text. For example, in our database we could have the
following scenario, where we have the first
name and the last name separated from each other's
using different fields, we would like to
have only one field called the full
name, for example. In order to do that, we can
use the plus operator in order to combine the first name Michael with the
last name Scott. And at the end result,
we're going to get the full name,
Michael Scott. But now if you check
the full name, we would like to have
always a separation between the first name and
the last name in the output. Inside the full name, we usually use space between them.
We can do the same. We're just going to
add one plus operator. We have Michael space, Scott. Between Michael and space, we're going to have
the plus operator. And between space and last name, we're going to have as well
another plus operator. The output is going to
be Michael space Scott. As you can see with
the plus operator, we can structure
anything we want by combining multiple string
values together using the plus. That's it. This is really easy. Let's go back to Tableau
and start practicing. All right, so now
we're going to go to the small data source over here and we go to our customers. We would like to
have the first name and the last name in the view. And as you can see,
those informations are separated in two
different fields. The task is now to create only one field for
the customer name, the full name, instead
of having two. In order to do that, as usual, we're going to go and create
a new calculated fields. We're going to
call it full name. Now we need the first
part, the first name. And then after that we're going to have the plus operator. Then we want to have a separator between them as an empty space, so we're going to
have it like this. And then plus operator, the last part going
to be the last name. Let's take the last name and
put it over here. That's it. It's important that the
calculation is valid, so everything is
fine. Let's hit okay. Now, as you can see
in the databain we have a new calculated field, a new dimension called full
name. Let's check the values. We're going to drag it
over here on the rose. And as you can see now we
have a very nice full name, George Pips, John
Steel and so on. It's really simple right now,
if you change your mind, you would like to have like
a dash between those names. What we're going to do,
we're going to go and edit it then instead of having the white space
over here in the middle, we're going to have the dash,
That's it. Let's hit Apply. And now we can see in
the full name that the first name and the last
name are separated with. So it's really simple. Let's
take now a quick task. The task is to combine the category and the product
using the following rule. As usual, you can pass the video in order to
complete the tasks, and once you are done,
you can resume it. All right, so now let's
check the solution. It's very simple. We're
going to go to the product. Let's first see the raw data. So we have the category
and the product name. And now we're going
to go and create new calculated field. We're going to call
it full product name. The rule starts with a category, then we have a R plus operator. After that, the separator
can be the double point. But after the double point
we have a white space. I'm just going to add it over here and we're going to
have the product name. Let's check the results. The
calculation is valid, okay? And here we have
our new dimension. Let's just drag and drop it over here and
check the results. Just going to make it
a little bit bigger so we can see the results from
here and here as well. So as you can see,
our product name now starts with the
category double point, then the product
name, and that's it. This is how we can work with
the concretinans in Tableau. It's very simple right now we're going to learn
the exact opposite. So we're going to
learn now how to split one field to multiple
fields using split. All right, so now we're
going to talk about the split function in Tableau. It's very important function and a lot of people get
confused about it, But I think it's simple. So let's check this example. We have here one field with
a lot of informations. So we have here
the product name, the product ID, and the product
code, all in one field. In many situations, in the
analysis individualizations, I would like to split those informations into three fields. So instead of having one field, I would like to have
it in three fields. In order to do that, we can
use the split function. And before we learn
that, we can do that with the left,
right, and mid. But the split
function is easier. In such a situation,
we want to split this field into
the product name, the product ID, and
the product code. In Tableau, we have
the following syntax. In order to do it, we have split and it needs
three arguments. The first one is the string, the texts, we want to split it. Now let's go and check
the syntax in Tableau. It's start with
the keyword split and it needs three arguments. The first one going
to be the string or the field that
we want to split. The second one going
to be the delimeter. Then the last one
the token number, the outward going to
be a string value. Now let's take an example. I would like to split this text and the delimter
going to be the dash. I would like to have the
token number one here. Tableau needs from
you two informations, the delimter and
the token number. The delimeter is the
separator between words. For example, we have
a separator between Canon and the ID using the dash. And we have another separator between the ID and the code. Those dashes are the delimeter
that splits my text. Tableau wants understand from you how the words
are separated in. Now let's move to the next
information that is needed, the token number here as well. Tableau wants understand
which part of information you are interested
in. Is it the first part? The second part
or the last part? Here we have like an ID or token for each piece
of information. So the first one going to
has the token number one. The second one we have token number two and the last one
is the token number three. In this example we said I'm interested in the
token number one, that means I'm interested
in the product name. The output can be, of course, if you're interested in the
product ID in the middle, we could say, okay, I'm interested in the
token number two. If you specify it like this, you will get the product ID. And if you're interested,
of course, in the last one, in the product code,
you can specify the token number three in
order to get the product code. So as you can see, once you understand
it, it's really easy. We just need two informations. What is the separate
between words and which token number
you are interested in? Now let's go back to
Tableau and start practicing. All right everyone. So there are three
ways on how to split your data inside Tableau. The first one is by creating
new calculated field. The second one is
automatic split. The third is customized split. So we're going to start with the first one on how to split your data using new
calculated field. We're going to take
the following example. We're going to stay with
the small data source. Let's go to the customers and
grab the phones over here. And the phone numbers
has a structure, so we have a country code, area code, and the
phone number itself. So now we would like to split those three informations
into three new fields. Okay, so let's see
how we can do that. We're going to go
as usual and create a new calculated field for the first part for
phone country code. So we're going to start
with the split keyword and it need three argument. The first one is going
to be the string that we want to manipulate, so it's going to be
the phone number. I'm going to add it like
this. Then the dilimeter. The dilimeter here is the dash. So as you can see, those stuff are splitted with the dash. So let's just add it over here. Then Tableau needs from
me a token number. So the first one going
to be the token number one, then 234. So we have four sections and we are interested in the
first token number. So the first one, let's
add one, and that's it. As you can see, the
calculation is valid. Let's go and hit Okay.
So now we can see that on our data Bain
in the data source, we have our new field,
the country code. Let's go and grab it to the
view and check the result. And with that, we are
extracting the first token, the first part of the phone. And with that, we have our country code,
Everything is perfect. Now, the next step we
would like to go and extract the area code,
the token number two. So now we're going
to go and create a new calculated field. But first, I would
like to take the old code because we want only to adjust the token number because everything else
can stay the same. Let's go and create a new one. We're going to call
it phone area codes. And then we're going to
put our code over here. The same stuff is going
to stay the phone and as well the
dash as separator. Then we want to change
only the token number two. So we are speaking
about the second part. So let's go and hit okay, and check the results we have
here again, our new field, so track and drop
it on the view, and as you can see
now we are getting, we are splitting yeah,
the second part. So we have here 555
and as well over here. So with that, we got the
third part from our phone. We have now the country code
and as well the area code. And now next we have the
following task for you. Create a new field
in the data source to extract the phone number, part without the country
and the area codes. Now you can pass the
video in order to complete the task and once
you are done, resume it. All right, so now
we're going to go and create a new calculated field. We're going to call
it phone number. We can have the same script, we have split phone, but this time we are interested in both token three
and token four. How we can do that in Tableau. We can add only
one token at time. In order to do that,
we're going to go and change this to three. Since we need both of the
informations in one field, we can use the plus operator. What do we going
to go over here? Plus, then we can add
the same code over here, but this time for the
token number four. We are getting both of
the tokens in one field. The calculation is
valid, let's say. Okay, and as usual, we got a new field in our data source. Let's check the
result over here. We can see that now we
have the phone numbers. Now, as you can see,
the first one is 1234567, and we have it as well. Over here we have as well, the same phone number,
but you might say, you know what, we are
missing the dashes, right? So we can go and add them
in our calculated field. So let's go and edit it. And we just can add new operator and between them we're going to
have the dash right. As you can see, the calculation is valid. Let's go and hit okay. And with that, we got exactly the same structure
from the phone. That's it for the first
methods and how to split your data using
new calculated field. You can see from
one field we have extracted three new fields. Now let's go to the second
method where we can split the data using
automatic split. All right, so now,
yeah, we can do that. We're going to stay with
the small data source, this time we need the URL. So let's take the
product image from here, drag and drop it in the view. And we know that in the URL there is a
lot of informations. And as well, we can use the
splitter to split the data. Now instead of creating manually
those calculated fields, there is really nice
feature in Tableau where we can split the
data automatically. In order to do that, we're
going to go to our field, the product name
radically connect. And here we have the
option of transform. We are manipulating the data. And here we have two options, the split and the custom split. The split is the automatic way. Wow. We got now a lot of new
fields in our data source, and that's because Tableau
automatically split the data and as we understood the content of the data.
So you can see here. The product image domain, then fragment path query schema. All those informations are part of the structure of an URL. Now let's go and check
those informations. We're going to take, for
example, the domain. Track it on the view, and as you can see, tablet
it correctly, right? We got now only the
domain information from the whole URL,
which is really nice. We can take as well
the scheme over here, and we have the protocols
from the start. As you can see, Tableau
get it really correctly. Some of those fields
is going to be empty, I think because we
don't have it as a part in our URL with Tableau. Did the automatic
split and if we would like to learn how
Tableau did split it, you can fight it as well inside this field because
it is Elcltd field. Let's see how Tableau did split the domain radically,
colon it. And as we can see here,
Tableau is using two splits in order to get
the domain information. The first split is this one. Tableau is splitting the
protocol from the whole URL. The separator going to be the double point and the
two forward slashes. And we are taking
the talking two. So we are getting
the second part. Once we get the second
part, can be really easy. The separator as you can
see is the forward slash. We want to split now
with the forward slash. And we would like to get
only the first part. It's really easy. You can
go and try it yourself. That's it. Let's click
okay with that Tableau. In some cases, not in
all cases is smart enough to split your data into
new fields automatically. That's it for this method,
the automatic split. Next we're going to
see the customized, okay? So we're going to stay
with the small data source and we're going to
go to the customers. Again. Here we want to split the phones using
the custom split, Let's bring it to the view. And then in order to
customize the split, we're going to go to the
data pane on the field that we want to manipulate,
radically connect. And then here we have transfer before we have the
automatic split. This time we are interested
in the custom split, let's go inside, and
then we're going to get a new window in order
to customize the split. And it's like the calculations, the syntax Tableau needs
from us two informations. First the separator, second, what do you want exactly
to get the token numbers? The first one, the
separator or the delimeter, in this example
going to p the dash, All those informations are
split with the dashes. Let's go and enter a dash. The second information, we
have the following options, So split off, and here
we have three options. Do you want the first part, the last part, or everything? And here, it depends
on what do you want. If you want to split
everything you want for each piece of information
in new fields, you're going to go
with the option all. Now let's say that
you are interested only in two informations, the country code
and the area code. The rest, you are not interested to have it in the data source. In order to get the
first two parts, we're going to go over
here and select first. And here you can explcify two. So we are interested in
the first two columns, in the first two informations
from the left side. But now let's say that
you are interested in the last two parts,
so you would like to get field for the last
two informations. So what you're going to
do, you're going to go over here and select last. And as well select two, so that you're
specifying for Tableau, What do you want exactly
to get as a results? How many fields from the start? From the end or everything? In this example I'm
interested to get everything. So we're going to go
with the option all. And that's it. Let's
go and hit okay. So once we do that,
Tableau going to go and create a
lot of new fields. So Tableau did manage to split the phone number
into four parts. So let's go and check
those informations. Drag and drop it over here
on the rows as you can see. The first part going to
be the country code, the second one going
to be the Area code. And then Tableau split those two informations into two fields. Here, it's not like the
second misthode where we are blindly automatically
splitting everything. Here we are specifying
for Tableau, few rules, and then Tableau can go
and as well automatically split the data to get better
quality in the fields. And of course, if
you are interested on how Tableau did the split, we can always go
to the database. All those informations
are calculated fields and we can go inside
them and check the code. So we can go over here and do it it and as you can
see the dilimeter is the dash and Tableau get it as a first token in order
to get the country code. All right, so that says
those are the three methods on how to split the data
inside your data source. They are really
useful in order to generate new
informations and split those complex structures inside the original data source into new structure for the
analysis individualizations. All right, so that's it, This is how you
combine and split text in Tableau. Next we're going to
start talking about the last string function
in Tableau, the replace.
138. Tableau | REPLACE: Now we're going to
learn about the lass use case for the
string function. How to replace specific
substring with another substring using the
replaced function as usual. Let's understand
the concept behind it then we're going to
practice in Tableau. Let's go, okay, the replaced
function in Tableau. It's very simple.
It's going to replace one substring with another one. For example, we're going to
have the following address, and as you can see in the middle we have the abbreviation of the street T. I would like to have a
normal wording of this, instead of having
the abbreviations. I would like to have the
complete word, street. We can do that using the
replaced function in Tableau. Let's check out the
syntax in Tableau. It's start with the Blake word and it needs three arguments. The first one it's
going to be the string, the original text that
you want to manipulate. The second one is the substring, the one that you
want to replace. The third one is
the replacement. It's really clear this is
going to be the new substring, the new word here, the output going
to be as well as string value in order
to solve this task. In this example, what
we're going to do, we're going to use
replace, then our text. Then the old one going to
be the T, the abbreviation. This is the old substring
and the new one going to be the street word.
How this can work. Tableau has first to search for the substring that
we want to replace. It's going to search
the whole text in order to find the substring. In this example, of
course we're going to find it over
here in the middle. The next step is that Tableau
going to go and start replacing this word with
the replacement Tableau. Going to take the SD
dots and can replace it with the complete word
off street at the ends. We're going to get
Louis Street, Paris. As you can see,
it's really simple. We are replacing the old value with a new value at the end. The string going
to look like this. So we're going to have a street complete instead of ST dots. Now of course, the question is, what can happen
in the output and the results if we
don't find anything? For example, we have
this address, Paris. We are searching
for the ST dots, but we don't have it
inside the text here. Tableau can return
the original text without changing anything. Nothing can happen. That's it. It's really simple, right?
We're going to go back to Tableau in order to practice
the replaced function. Okay, now we're going to go and practice with the
small data source. Let's go to the
customers and we can manipulate the phone number
again for the customers. Now as you can see, the
structure in the phone number starts always with the
plus for the prefix, for the international call. So now we have the
requirement to replace the plus with 00 as a prefix. Now, in order to do
that, we're going to use the replaced
function in Tableau. In order to do the switch, the replacement, let's go and create a new
calculated field. We're going to call
it phone replace. Let's start with the
keyword replace. We need now the field that
we want to manipulate. It's going to be
the phone number, so we have it over here. And now we need to specify for Tableau the substring
the old value. The old value is the plus sign. And now we have to specify
for Tableau the replacement, the new value, the new
value going to be 00. That's it. Tableau has the
calculation as a valid. Let's go and hit okay
with that, as usual, we created a new calculated
field in our data pane. Let's go and check the results. So drag and drop the rose and
now we can see the result. Instead of having the plus sign, we have everywhere 00. And with that, we have
fulfilled the requirement. And now we might get another requirement where they
say, you know what, I don't want those minuses
inside the phone number, so it would be nice
to remove them. Now, in order to do that, we're going to do
the same thing. We're going to use the
replaced function. The old value going
to be the dash and the new value
going to be nothing. Let's see how we
can do that. So now let's go and edit our
calculated fields. We just want to add
new replaced function. So let's go edit over here
until it doesn't matter whether we want to replace
first the plus or the dash. So now in order to do that,
I usually do it like this, if I'm doing nisted, replace what we're
replacing the phone number. Instead of having the dash, we're going to have nothing. We are replacing the old
valued dash with nothing. Now, in order to have it listed, I would like to take this part, the first one, and put
it instead of the phone. With that, we are having
nisted calculations. First, we're going to
replace the plus sign. Second we're going to
replace the dash sign. Let's take it to the first row, and with saying the
calculation is valid, let's go and hit Okay. And as you can see
now in the results, we don't have any
dashes or plus sign, so we have a whole
number without any special characters with that resolved the
second requirement. It's easy, right?
It's not that hard. And we can do a lot of things
with the replace function. It's great function the
string values in Tableau. Now for you, we have
the following task in the big data source,
in the product name. We would like to
replace the hash simple with a number
as abbreviation. And now we can bout the video in order to complete the task. And once you are done,
you can resume it. All right, so we're
going to go to the big data source
at this time. And we're going to
go to the products. And we need the product name. Let's drag and draw it on the
view and check all values. So now we're going to make
it a little bit bigger in order to see more
values inside the data. We have some hashes
like for example at the start and we want to
replace it with in our point. In order to do that,
we're going to go and create a new calculated field. Let's go on the arrow over here, create a new calculated fields. We can call it products replace. So we're going to start
with the replace keywords. And then we need the string
that we want to manipulate. It's going to be
the product name. The next we want the old
value, it is the hash. And then the replacement
is going to be the number as abbreviation
in our point. So that's it. As you can see, the calculation is valid. Let's go and hit Okay. So we have a new dimension, new calculated field
in our data pane. Let's try contribute in the
view and check the values. And we see over here
instead of the hash, we have the abbreviation
of the number. So with that, we
have learned that the replace function is very simple and as well very
important in many use cases. I use it a lot once I want
to clean up the data. So sometimes we get ad quality
from the sources and there will be a lot of like special
characters I can use, always replace, to clean
up the data and to remove those special characters with something more meaningful
in the visualization. Like we did in this example, we replace those
special characters with something more meaningful, or I use it a lot as well, to change the format
of something. So for example, we here
have the phone numbers. And we change the
format from having the dashes to something
else, like without dashes. And as well, instead of
the plus, we have the 00. So with that, we are
not cleaning up here. The phone, we are changing
the format and how we are presenting the phones
in the visualizations. On the left side we
have the plus and dash. On the right side,
we don't have them. We usually use the
replaced function in order to change
the structure, the format of one field. It is just amazing and very
important tool in Tableau. All right everyone. So that's all for the replaced function. And with that, we have covered all the use cases in
the String functions. We have learned around 16
String functions to manipulate, transform, and clean up
the Tix values in Tableau. Next, we're going to
jump to another group of functions in Tableau,
the date functions.
139. Tableau | Extract Dateparts: DATENAME, DATEPART, DATETRUNC, DAY: Now we're going to talk
about the third group of functions under the category row level
calculations, the date functions. There are three use cases for the date
functions in Tableau. The first one is to extract specific date part
from our date, like day, year and month. For that, we have six different
functions in Tableau. The date part, date, name, date, trunk, month, year. The second use
case is to add and subtract date values
in our data source. So here we have two functions, date, add, and date. The last use case is to find and fetch the current
date and time. And here we have two functions, today and now those date
functions going to give us a tool to manipulate and transform the date
values in Tableau. We're going to start now
with the first use case, how to extract specific parts from the dates using
those functions. As usual, it's
really important to understand the
concept behind them, then we can practice in
Tableau. So let's go. All right everyone.
So in Tableau there are two ways on
how to manipulate, transform the fields
with the data type date. The first one is to
do it globally in the data source for all
worksheets, all workbooks. The other way is to do it
locally only in one worksheet, only in one view
for the first one, if you are manipulating the
date and you want to re, use it in different worksheets
in order to do that, we can go and create a
new calculated fields using the date functions. But now on the other hand, if that transformation
is not that important, you don't want to reuse it, you don't want to use it
in any other worksheets. You need it only
once in one view. Then, instead of creating new calculated field in the data source and using
the date functions, we could just simply go and change the date format
directly in the view, which is easier and quicker than creating
new calculated fields. As you can see, there is
like two methods on how to manipulate and transform
the dates in Tableau, either using the date functions or changing the date format. Now, if you ask me which
method should I use, you have always to ask
the following question. Is the transformation
going to be needed in different worksheets? Then yes, go and create a new calculated field
using the date function. But if the transformation is
only needed for one view, then you have to
change the date format directly in the visualization. Now we're going to
go and focus on the date functions since
we're talking about the calculations and at the end we're going to talk
about the date formats. So in Tableau we've got punch of date functions that all has the same goal to extract date
parts from specific fields, and we can use them to
generate such a view. So as we can see over here, we have the years,
we have the monthss, the quarters, all
those informations comes only from one
field, the order date. And we can build from all those new information
that we extracted. A lot of analyses
and insights about our data like the one that we
are seeing here, the t map. So now let's go first understand those functions and then
we come back to Tableau. All right, Okay, so now
we're going to talk about the first date function in
Tableau. The date part. We can use it in
order to extract a piece of information
from our date fields. So for example, we have
the following date structured from year,
month and a day. We can use date part to extract
one piece of information, like for example, the year. If you are extracting the year, the output is going to be 2025. But if you're
extracting the months, we're going to get the August 8. If you're extracting the day, we're going to get 20 here. It's very important
to understand that if you are using the date part, the output going
to be in number. The year going to be in number. The month will not be August, it's going to be, it's
going to be eight. Same thing for the day, so
you will get 20 as a number. Let's see the syntax in
Tableau, it's very simple. Let's start with the date part. The Tableau needs from
you two informations. The date part here,
Tableau can ask you which piece of information
you are interested in. You would like to have the year, month, day, and so on. The second part, the
second argument going to be the date field that
we want to manipulate. The output, the result of this
function can be a number. Now let's take an example. We're going to take date part. Now we are interested in
the information of day. We would like to extract
the day information. Then our date going to
be looked like this, the output going to be 20. If we want the months, then we have to specify
a month, the date part. And if we do it on these dates, we will get the months eight, the same thing if you
want to get the year. So here we specify the
year at the start, then our date, the
output can be 2025. So that sets for the date part. This is one method on how to extract a date part
from a specific date. Let's move to the next one. We have the date name. Let's say the syntax in
Tableau, it's exactly the same. Let's start with the
date name as a keyword. Then Tableau needs from
you two informations, which part of the date
you are interested in, and give me the field that
you want to manipulate. But this time the output can be a string value. Let's
take an example. Let's say that we
are interested in the year part from our date. So the output can
be, again, 2025. But the value going to be
in the data type string. But this time if
you say you know what I'm interested
in, the month. So you specify a month as
a date part this time. Tablo can answer with August instead of eight because
the output here is string, so you will get the name
of the month as an output. And now the next one, if you say I'm interested in the day, if you specify in the date part, a day instead of month, you will get as well a 20
but as a string value. So that's it for the date name. It's very similar to
the date part, right? But the only difference is that there you are
getting a number, but with the date name, you
are getting a string value. This is another method on how to extract the date
parts from a date. Let's move now to another set of functions be used as well to achieve the same
goal in order to extract dates parts from a date. This time we have three
quick functions in order to extract quickly the date part from a date. They
are my favorite. I tend always to use
them in compared to the other two because they
are really easy to write. The syntax Tableau going
to look like this. The first function, it accept
only one argument, a date. Same thing for the
month. And for the year, the output is going
to be a number. It's like the date
part function. For example, if I'm interested in the day, I can
do it like this. I use the function day. Then the date that we
want to manipulate, then the output going to be 20, as you can see,
compared to the others. It's really quickly to create. Right here, we don't have to specify for Tableau
in the syntax, the date part because the function name called day. The same thing
for the month. If I'm interested
only in the month, I can just use the
function month in order to extract the August or
eight for the last one. If I'm interested in the year, I can use the function year. As you can see, they are really
easy and quick to create. If you compare it
to the other two, as you can see, they
are really easy. Let's move on to the next one. This going to be slightly
different than all others. We have the date trunk. Okay, Some facts
about this function. It is a little bit complicated. A lot of people
don't know about it, but I tend to use it a lot. It's very useful function, but it is not that famous. Think about the date trunk
rounding function in numbers, if you have a lot of
details in one date, you can round the date
to specific level. What this means, if we
have the following date, time we have here like
hierarchy, right? We have a year, month, day, hour, minute and seconds. We are seeing in this data
a lot of information, Sometimes you are not
interested in a lot of details like seeing the
seconds, minutes and hours. You would like to see
only at the month level. What we can do, we can use the date trunk in order
to round those numbers. Let's check first the
syntaxing Tableau. It's very similar to the others, it looks like this date trunk. Then you specify the date part and then the date
that you want to manipulate output This time it will not be a
number or a string, it's going to be
date and time, okay? The best way to
understand this function is to have some examples. So let's say that we
specified at the date part a day and then we
have our time and day over here. Then
what can happen? What you are telling
Tableau thats the time informations are
really detailed for me and I'm interested only to see this piece of information
at the day level. So I'm interested only
at the day informations. I'm not interested in the time, what can happen in the output if that table going to return
the same information, but this time it's going to
reset everything at the time. So you can see we
are maintaining all the information
about the year, month and day, but
anything below the day, it's going to be
resettd to zero. As I said, it's like
rounding numbers, right? You are rounding the
information to specific level. Now, let's move to the
next level where you say, you know what I'm interested
at the month level, you specify at the
date part a month, then we're going to have the
same information over here. What you are saying
to Tableau is that I'm not interested in
the details in the day. I would like to see
my information at the month level that we're
going to get 1 August in 2025. Now we're going to go one more
step where we're going to say we are interested
only at the year level. So, if you go and specify at the date part the
year, what can happen, You tell Tableau I'm not
interested in anything else, I'm just interested in the year. I think you already got
it. What can happen? Everything can be reseted. Anything below the year, so the month, the day, the time can be reseted to one over year than
zero at the times. And we can have only
the value 2025. So that's it for this function. It is very useful in
many calculations to use the date trunk. Now let's go and
compare all those functions side by side. We have here as a rose, the date part, so we have year, quarter month, day, and so on. And then we have
here on the columns, those different functions. I don't include here the day, month and year functions because it's very similar
to the date part. So the first thing to
understand is that the date part output going
to be a number, date name. Output going to be string date, trunk output going
to be date and time. And we can work with
the same example. So we have the
following information about the date and time. Now let's go and
see the output of those functions and those different levels
in the date part. Now let's start with the
first level, the year. If you say I would like to have the date part of this
information, you will get. 2025. The same thing
for the date time, but this time for
the date trunk. You're going to reset
everything below the year, so you will get 1 January 2025. So let's move to the next level. We have the quarter, the date
part quarter of this date. It's going to be three. The same for the date name, it's
going to be three. But this time it's
interesting, right? Because in date time we don't have usually the
quarter informations. So this time it's going to reset to the first month
of the quarter. It's going to be the
month number seven. So let's move to the next one. We are at the month level, so if you use the date
part, you will get eight. If you use the date
name, you will get the full name of
the month, August. And if you use the date trunk, you're going to reset
everything below the month and you will get the
first day of August. Moving on to the date, if
you use the date part, you will get a number 20, the date name, you will
get a string value 20. And this time at the date trunk you are resetting
the whole time. Moving on to the next one, we have alternative for the day and here we're going
to get the weekday, the number of day inside a week. Here we're going to get
the number four from the date part because
it is Wednesday. So if you're using
the date name, you will get the full name
of the day Wednesday. And for the date trunk,
nothing going to change. We just going to reset
the time as well. Now, if you are
moving in details, if you extract the hour for the date part and date
time, you will get nine. And here as you can see, we are resetting
now only the minute and the second because you
are not interested in it. Moving on to the next 1 minute, we'll get 45 in part name, and here we are, resetting
only the seconds. As you can see, only
seconds are zeros. Now let's move to the lowest
level in the hierarchy. We have the second, so
we're going to get 21, 21. And the output going to be exactly the same
value in the input. So that you can see
the big picture using those three
functions and what are the main differences
between them and what you're going to
expect if you are using them. Now let's go back to Tableau and start practicing
those functions. Okay, so now we're going
to go to our source. Let's go to the orders. And we will be manipulating
the order date. Let's take it to the view tab, going to convert it
immediately to a year. We are not seeing
the original data, we are seeing only the year
apart from the order date. Because table wants also
to make visualizations. And of course it makes
sense to have years instead of all dates
inside our data source. But in order now to show all the data like
in our data source, we're going to go
over here and switch it back to the exact date. Let's click on it and table going to convert
it to continuous, but I would like
to see all values. We're going to switch
it to discrete. Now as you can see, we get all the values exactly
like the source system. We have around five
years of data. So now we're going
to go and practice by extracting the date part. We're going to start
with the year, so let's go and
extract those years. We're going to go and create
a new calculated field. Let's call it order date, year. So here we have a lot
of ways in order to get this information we
can use the date part, the date name, the date trunk, or even the year function. All right, so now
we're going to start with the date part. And as you can see it
except two argument, but the third one is optional here you can define what
is the start of the week, but I usually leave it empty. The date part that we want
to extract now is the year. Then the date that we
want to manipulate is the order date that, and as you can see that
the calculation is valid, let's go and hit Okay. As we learn the output of the date part going
to be a number, that's why Tablo going to create a new
continuance measure. But I would like
indivisualizations to see is distinct
values of the years. I'm going to go
and convert it to a dimension now as you can see, it jumps to the dimensions and we have it now as a
discrete dimension. Let's bring into the view
and check the results. As we can see now we have
all the years exported, extracted from the order dates. Now let's go and try
the other methods. Let's replace the data part
with a date name. Here. It's very important
to understand that the data type
going to change. Here we have it as a number. If we switch it to data name, we can get it as a string. Let's go and change
our calculation. Instead of date parts, I'm going date name.
Let's hit Apply. And as you can see, immediately the data type going to
switch to string value. But in the view, we're going to get exactly
the same result, right? Nothing going to change,
only the data type. Now we're going to move
to the easiest one. The quickest one is to use the year function instead
of the whole thing. Over here we can write a year and we don't have to
specify the date part. That's why we're
getting an error. We need only our date. That's we want to modify that. Let's hit Apply as well. Nothing going to
change in the view, but the data tape going
to switch to number, because the output of these
functions is a number. Now you might ask me, okay,
which one should I use? I recommend you always to
use the quick one of course. But what is more important
is the data type. The data type number is always faster than
the data type string. The data type string
is the worst. It is the slowest data
type from all others. We always try to avoid
the data type string in the visualizations not to have bad performance
in our views. If you are thinking about
those three functions, I would always avoid
that date name. Now we are left
with two functions, date part and the
quick function. I would always go with
the quick one, right? Because it's easier to write. I would prefer this situation to have year or the date like
I'm showing it in the view. But of course, in a
lot of situations you want to show for example, the day name or the month name. It depends really
on the requirement, but if you can avoid it. Don't use date name. So that is this is
my recommendations to you and what I usually do. So now let's close this and extract another
part from the date. We're going to have the quarter. So here again we have
the three options and all three deliver
the same information. So I would go and create
a new calculated field, let's call it order
date quarter. And this time I'm
going to use as well the quick one quarter
quarter dates. So that it's really
simple, right? Let's it. Okay, and now we have again
a new continuous measure. I would like really
Tableau here to create immediately a dimension. So I'm going to go
and convert it again to dimension because I use
it in the view as dimension. Let's check the results
and we can see we have now the quarter number
which is correct. All right, so now let's go and extract another
information from our date. We're going to get the month. Let's go and create again
a new calculated field. We're going to call
it order dates. Now this time we can use a month function and
our field order date. It's very simple, right? So let's go and hit, okay. And we're going to
convert it again to dimension and bring
it to the view. With that, we are extracting
the month information from the order date.
Everything looks fine. Here we have September,
August, and that's it. And here we are usually in this situation where the users would like to see the months
as a full name. So instead of having
the month number, we would like to
have the month name, which I really agree,
because it's easier to read the month
name than the number. In order now to change it, we can use the date
name function. So let's go and change
our calculation. So let's go and eat it now, instead of month, I
just can remove it. Let's have the date name then, the part going to be month. And then we have
our order dates. So let's hit okay. And now of course what happened. We changed the data type and as well the values
inside this field. So we are now getting the
complete name of the month. So we have January, February, and so on. So that's it. This is how we can extract the different dates parts from our original
field, the date. The question is how to use those new informations
in our views. All right, so now we're going
to go and create a view from three informations,
category, order, date, and sales, using a heat
map or highlighted table. Now the first thing
that I would like to do is to remove the order date. This is a lot of details, we
don't need it in the view. Then we're going to
have the rows the year. I'm going to leave
it, but I will take the quarter to the columns
and as well the month. And of course, what
is missing now is to fill those gaps
using a measure. Our measure going
to be the sales. Let's drag and rub it over here. Now, in order to convert
it to a heat map, we have to add it as a colors. Let's take the sales again
and put it in the colors, or you can hold control
and drag it to the colors. We're going to get
the same results. Now we are almost there. I would like to have,
instead of text, I would like to have squares in order to get the heat map. With that, we got a heat map. We can change the
colors if you want. So let's go to
colors, Edit colors. And I would like to have
it as blue. It okay. So with that, we have created our heat map using only
one field, the order date. So we have the years
from the order date, we have the months
from the order date, and as with the quarter. So as you can see, those
parts that we extract from the dates are really
useful to make visualizations. So now we can go and add the
final touch in this view, and that is by making
abbreviations from the month name. As you can see here,
the February is really big for the
seal over here, so we can make it shorter. In order to do that, we
can use the lift function. So let's go to our calculated
field and edit it. And now before we're
going to add left. And then at the end we're
going to add three. So I would like to get
only three characters from each month.
Let's go and hit. Okay, perfect. Now we
have abbreviations for each month and the view
look more professional. There is nothing
that we have to add, I promise with the last one. It is the category,
we forgot about it. So let's go to the categories and just drag it
before the year. So with that, we got really
nicely those categories, and we can see inside it how those categories are
developing over the time. So with that, we got a
really nice heat map, all those informations
from the date. Now we have in our data source
a lot of new information about the order date where we can use it like
almost everywhere. Now we have another very
common use case for those new informations where we can use those date
parts as a filter. Let me show you what
I mean. Let's go again to our orders. And we're going to go
to the month ratlic on it and show it as a filter. The same thing we're
going to do for the year, radically on it and as
well show it as a filter. Now we can see those
informations on the left side, and the logical order
is very important. First a year, then a month. Since the month has
a lot of values, let's go and switch it to a dropdown with multiple values. Now using those filters, the users can go and specify scope for this view by changing
the values of the year. And as well for the month. This is very common use case for the date parts in Tableau. That's it for those functions. Now let's move to the last one, We have the data trunk. Okay, now in order to see the
effect of the date trunk, let's go to the big data source and get all the other dates. To the view, I would like
to see the exact date. Let's switch it to exact dates. And I came to discrete
to see the values. All right, so next
we're going to take the sales to the view as well. With that, you can see
we are seeing all the, all the information that
we have in the side. And we have a lot
of details now. Let's say that I'm not
interested in the days. I would like to see one
date for each month. We would like to have this
date at the month level. In order to do that,
we're going to go and create a new
calculated field and we're going to use the date
trunk. Let's go and do that. We're going to call
it order date. Then the syntax can be like this date trunk and it
accepts two arguments. The first one going
to be the date part. Which level we want to see in the view we want
to have the month. Let's specify here month then the date that we
want to manipulate, which is the order date that sets and the
calculation is valid. Let's go and hit okay. And on the left side we've got a new dimension with the
data type date and time. What we're going to do
now, we're going to go and replace the order date
with this new field. Just put it on top of it. Again, here we have
to do the same thing, right click on it, switch it to exact dates, and then again to the discretes. Now we have a new date
field where everything at the month level we have
always the first of the month. So we have 1 January, 1 February, and so on. So as you can see now the
list is short, right? Because we have now one
row for each month. Before we had one
row for each day. Now I'm not interested in
those zeros in the view, I would like to get rid of them. In order to do that, we
can change the let's go to our date trunk and
let's switch it from date and time to date.
Let's go and do that. As you can see now
we have a date field and all the time is away. Now, let's say that
I would like to have a date only at the year level. I don't care about the
days and the month. I would like to have
one row for each year. In order to do that,
we're going to go and edit our calculated field now, assembly, we're going
to go and change the value from month to year. That's it, let's
go and hit Apply. And you're going to see
over here that we have now one row for each year. So now we have a field
always at the year level. And we got like
around five years, as you can see with
the date trunk, we can control the level
of the date field. So let's say that we
want to switch it today. We're going to go and
switch the year today. And now with that we're going
to get all the details. We have one row for each date and with that
we have a lot of details. We are back like the
original field order date. So this is how we work with
the date trunk in Tableau. Okay, so there's
another way in order to visualize the effect
of the date trunk. So let me show you how to do it. Let's first close
this thing here. And then we're going to switch the order date trunk
to continuous field. So let's go and do that. Now let's go and
flip everything. So we're going to have
the order date at the columns and the sum
of sales at the rows. And instead of having
power, let's have a line. Now in the visualizations, we have a lot of marks. If you mouse over on
those informations, you can see we have
one mark for each day. And that's because
we have defined in the order date, trunk that
we are at the day level. And you can see here
on the details, we have around 1,800
marks in this one view. Now if you say this
is a lot of details, let's switch to month. Let's go to our calculated
field, edit it, and just move it over here
on top instead of day, we're going to have a month. Let's go and hit Apply. So let me just close this from here and let's check
the view we have. Now for each month one mark we are at the month
level and the marks are totally reduced a only 60 instead of
thousands of marks. With this, we don't see a
lot of details in the view, we have one mark
for only one month. This is the power
of the date trunk. Let's say that we want
to go to the years, and I think you already how many marks
we're going to get. We're going to get only
five marks each point, each mark can represent a year. This is the power of
the date trunk to control your view and which
details we are talking about. All right, so that's it
for those functions. They are really
great in order to extract specific
parts from a date. And as you can see,
they are really useful for the visualizations. Now we've used a lot
of calculated fields. As you can see on the left side, we have a lot of new dates
in our data source globally. That means if I go to any
other worksheets or even to any other workbook connected
to my data source, I'm going to see the
exact fields that I created using the
calculated field. And I can go immediately and start re using them
in my visualization. Which going to
save a lot of time by doing formatting and so on. So that's how to extract the data parts using calculated
fields to be globally. Next we're going
to start talking about how to do it quickly, locally for only one view
by formatting the field. Okay, so now we're going
to start from the scratch, we're going to go to
our big data source. Let's go to the orders and get the original field of the
order date to the columns. And again, let's take
the cells to the rows. Now as you can see, Tableau
always brings it as a year. And that's because it wants to visual only small amount
of data at the start. And then you decide on
what do you need here? We can go and manipulate the order date
directly in the view by changing the
format instead of going and creating
calculated fields. Now in order to format the date, we're going click
on the dimension itself. So right click on it. And now we have here
two important sections. The first section is a
discrete section where it's going to use the
function date part and the other section is a
continuous section where it's going to use the date trunk and he always on the right side. As you can see, we have those
gray examples in order to show you which format going to be presented in
the visualizations. For example, there's
no difference between this year and this year, but here we have
the quarter two, but here we have the
quarter plus the year. So you can see the
formats that's Tableau going to use in the
presentation in the view. Now let's go and
check the differences between this month and this one. Let's start with the first
one. Let's click on Month. Now as you can see our
field states clues means it's discrete and
we have those values, January, February,
March, and so on. We have it as a
text. If you would like to know how Tableau
did create this, you can go over
here on the month, double click connect
and you can see the format Tableau is using, date, part month,
then the other dates. So you can see the
syntax that is Tableau is using to quickly
format your view. Now let's go to the next one. We can have the month
as a continuous field, right click, Connect again, and now we can have the
month plus the year. Let's go and click Now
you see that our field is continuous and if you
double click con it, you can see that Tableau
is using date trunk. Now we see the years in
the axis and each mark, each point of those
staff are a month. As you can see, it's very easy. We are just clicking around and we are changing
the whole format of our dates. What I usually do, I go and select different
formats until I'm convinced about the correct format
that can represent my data. And there are as well a lot of different formats.
So let me show you. Let's go to the order date. As you can see, we
have, yeah, is a year, quarter month, but here we
have the option of more. You can see we have
a week number, a week day, and you get more options if you
go to the custom. Now here you're going
to get a list of all possible formats
that we can use in order to change the
structure of our dates. The same thing, of course for the continue is filled.
So if you go again, you can see we have here
as well more so you click the custom and as well you can change the
different formats. Of course, any decision that you are making now on the view, it's going to stay
only in this view. If you switch to any
other worksheets, you will not find what you
have already formatted. This is the only disadvantage
of making a lot of decisions in one sheet then you will not have it
in the next sheets. There is as well more options on how to format the fields. For example, let's go
to the other date, right click on it and let's choose this month
as a full name. Then I'm just going to switch those columns with the rows. Now we can see that in the header we have the
full name of the month. But we can go and
change the format of those headers by just
right click on it, then go to format. And
then on the left side, we can change the display
format of the header. For example, on this
one or the dates. If you click on it, you will get different options like here,
for example, abbreviations. Once you click on it,
you can see now we have an abbreviation of
the month name. Or we can get the first letter
of each month if we want. Really to make it
small so we can go over here and change
it to first month. With that, we're going
to get the first character of each month. Of course, those formats
are not only for the month. Let's take, for example, the weekday, we're
going to go over here, then switch it to week day. We have here the full text of the day in order to
make it abbreviations, we're going to go
on the left side again and switch it
to abbreviation. And with that,
we're going to get shortcut for the week day. So as you can see by
just clicking around, we're going to change and
manipulate the values of the dates inside our data source without writing anything, without writing any syntax, or creating new
calculated fields. So we can just do it
quickly in one view. But here, if you find yourself
that you are repeating the same format over and
over in different sheets, I recommend you to go and create a new calculated
field for that, to store it at the data source, and use it once you need. All right. Kay, so that's it for those functions and how
to format the dates. Okay, Kay, so what
does we have learned? How to extract a specific date
part from our date field. Next we're going to talk
about two functions, date, add, and dated.
140. Tableau | Add & Subtract Dates: DATEDIFF, DATEADD: Now we're going to
learn how to add and subtract dates in Tableau
using the two functions, date add and date
f. But as usual, let's understand the concept
then we can practice. All right, so now
we're going to talk about the function date ad. We can use it in order to do mathematical operations
on our date field. For example, we can add
three days to our dates, or we can, for example, two months from our dates. We can manipulate our
date by adding or subtracting specific
intervals from our dates. Now let's see the
syntax in Tableau and take some examples in
order to understand it. It's start with the date ad as a keyword and it needs
three arguments. First, the that we are
interested to manipulate. The interval is
like how many days, how many months you want to add. Then we have the
date field itself that we want to
change the output. The result going to
be a date field. So for example, let's
say that we want to add three years to our date. We specify at the
date part years, then the interval is
going to be three. And then our date,
what's going to happen? Tableau going to go
and add three years to our date field that we are adding three years to
this piece of aformation, the year and the rest, the months and the day is
going to stay as it is. Let's move on. Let's
say that we want to add three months
instead of three years. So what we're going
to do, we can specify a month
at the date part, then three as an interval, then our date as well. So
what's going to happen? We're going to change only
this piece of reformation. So instead of having August, we're going to have November that we are changing
only the month. The risk going to
stay as it is now. We can move to the
last one, to the day. We would like to add three days. I think you already got
it. So what can happen? We are going to add three days, so we're going to have
the 23 instead of 20, and it's changed only
at the day level, the risk going to stay the same. With this, you can
see we can add different intervals to different date parts
in our date field. And in our examples we were working with positive numbers, but in Tableau we
can as well use the negative numbers
that we're going subtract intervals
from the date. So let's take an example.
Let's say that we want to subtract three
years from our date. So we're going to have
here the interval as a negative three minus three. And the output we will have, instead of the year 2025, we will get 2022. Of course, the same thing,
we can do it on the day. So we would like to subtract
three days from our date. So instead of having the day
20, we're going to have 17. So as you can see,
we can use the date add in order to
add new intervals, but as well to
subtract intervals, it's very important function in Tableau in order to
compare things together. Like we can compare this
year with the next year. So we're going to go
and add one year to our field that we're
going to get two fields, the field with the current year and the field with
the next year. We will see that
in next examples. So that's it for the date add. Let's move on to the date. The date diff function in
Tableau has a very simple task and that is to subtract
two different dates. So for example, let's say
that we have two dates, the order date and the shipping
date in our data source. So let's say that you ordered
something in this date, 2025 in November and you received your order in
the next day in February. So now if I ask you how long it took to ship your
products to your house, you're going to subtract
those two dates in order to give me the number. This is exactly what the
date diff does in Tableau. So the syntax is going to be looking like this. Date diff. Then we have three informations, which date of part you
would like to subtract. Then we have the starting
date, in this example, the order date, and then the
end date, the shipping date. The output going to
be always a number, as usual, we're going to have examples in order
to understand it. So here we're going
to ask Tableau how many years it
took to deliver, to ship this product. So here we are interested in how many years we
are interested in the year part then the
start date going to be the order date and the end date going to be the shipping date. If you do that in Tableau,
you're going to get one. So it took one year
to ship the product. So here we are talking at the year level,
you will get one. Now let's go to the next level. Let's say how many months does it take to do the shipment. So here we are specifying
at the date part a month. We have as well, the
same information for the start and the end date. And this time you're going
to get three months. So the answer is
going to be it took three months to ship the
product to the customers. All right. The next
question going to be how many days it takes to ship
the product to the customers. And this time it's
going to be 68. So now we are talking
at the day levels. So the result going
to be, it took 68 days to ship the product from the order
date to the shipping date. So in this situation,
it makes sense to use the date because
we always want to understand how many
days exactly it took to send the product
to the customers. Because if you have like a year, you're going to
think it took the whole year to send the shipment. That's it. This is how
this function works. It's very simple and very
useful in the visualizations. Now let's go back to Tableau and start practicing
those two functions. All right, now let's go and see how we can create
that in Tableau. We can stay at the
peak data source. Let's go to the orders, and we can manipulate
the order date. Let's bring it to the view over here and we're going to
show you the exact date. So we're going to
go and switch it to exact date to
see all details. And I would like to have
it as discrete to see all the values inside our data source. Now
it's really simple. Let's say that I
would like to add one year to my order date. In order to do that,
we're going to go and create a new
calculated field, so we're going to call it
order date plus one year. We're going to use
the function date, adds it, need three arguments. The date part, we
are adding one year. The date part going be a year, the interval going to be one. And the date that
should be manipulated is the order date.
It's very simple. As you can see, that was
the calculation is valid. Let's sit okay and
check the results. As you can see, we've
got a new field in our data source with the
data type date and time. Let's check the results. We're going to grab
it to the view, but I would like to see
as well the details. I would like to see
the exact date. Again, we have to
switch it to discrete in order to see the results. Let's switch it to discretow. As you can see, we
have a date and time. If you want to get
rid of the time, we can cast the to date. In order to do that, let's go to our Data Pain,
this is our field. Click on the icon of
the data type and switch it from date and time
to date. Let's do that. And as you can see, now
the time did disappear. At the results, we see that
everything is plus one year. We have here 2018 as the result, 2019. We can check other dates. If we sort this as descending, we can see that we
have the value as 22 and here we have it as 2023. That's it. This is
how we can create a new field with plus one year. Let's add one
month. Now let's go and edit our new
calculated field. Right click Edit, and let's change as the name
from year to month. Now instead of the date part
year, we can have a month. It's very easy to switch. And if you select Apply now we can see that we are adding
one month to the data. If I sort it again
to the old one, you can see here
we have January, and now we have it as February. We can do the same
if you switch today. If you want to add only one day, let's apply and add the results. You can see that we are adding
everywhere plus one day. Of course, we can add to the
intervals negative numbers. Let's say we would like
to have minus one day. Let's apply and
check the results. As we can see in the results
in the new calculated field, it's always one day behind the original field
of the order dates. This is how we can work
with the date adds. It's very simple. All right, so now we're going to go
and create a new view to analyze the average days
to ship peer subcategory. It's really important for
inventory management, optimizing operations allocations
of resources and so on. So we can create that
using the Date Tableau. But first let's bring
a lot of data to the view in order to
understand how this works. We're going to stay
with a big data source. Let's go to the orders. And here we need our two dates. The first one going
to be the order date and the second going to
be the shipping date. Let's add as well the
order ID at the front. Yeah, we everything to
see the results as usual. Tableau, show it as a year. We would like to see
all the details. That's why we're going
to go and convert it to exact date. For the first one, we're
going to do it exact date. It might take a little bit long time because
we have a lot of data and we have it
now as a continuous. I would like to see
all distinct values. Let's convert it to discrete and do the same thing for
the shipping date. We're going to convert it
as well to exact dates, and then to discretes, we're going to go and
move it to discrete. All right, so now
we have all the information that we need. We have for each order one row. Now we're going to go
and create our new calculated field
in order to find the differences
between the order date and the shipping date.
Let's go and do that. We're going to go and
create new calculated field called days to ship. And we're going to
use the function dated and it needs
three arguments. The first one is
the date part here. Of course, since we are
saying days to ship, we are interested on the days, how many days it took to place
the shipment at the users. So we can enter here day. The start date is going to be, of course, the order date. And the date is going to
be the shipping date. We have it like this and
let's check the validation. The calculation is valid,
everything is fine. Let's go and hit okay. And since the output
going to be a number Tableau did created as
continuous measure, let's take it and put it on our view and check the results. Let's take, for
example this order. The customer did
order in December 7, and after four days, the customer did
receive the shipment. With that, you can
see the differences between those two
days is four days, everything looks good.
Let's take another value. Maybe some recent orders,
so I'm going to sort it. Descending from the order
date as you can see here, the customers did place an
order at the last day of 2022. And after 24 days, did the customer
receive the shipments? We can see here the
days to ship is 24. This is how the date works. Now we're going to go
and create our visual. We want to show the average
days to ship pair category. Now we want to get rid
of all those details. We don't need them, we
just need our measure. Now we need the
subcategory, the product. And get the
subcategory over here. And then we're going
to take our measure and put it on the columns. But now we have it as a sum. We would like to have
it as an average. Click on the measure, then
go to the measure sum, And here we have the average.
Let's switch it to that. Now we're going to add
some more information. Let's add a label. And as well, let's
change the colors. Let's bring the
average days to ship control and then put
it on the colors. Since it's bad thing, we're going to switch
the colors to red. Let's go to the
colors over here. It colors now instead
of Automatic, we're going to switch it to red. All right, Let's click okay. And then we're going to go
and sort the list like this. Now let's go and check the data. As you can see, the worst subcategory we have in our data. Yes, it takes longer
time to be delivered to the customers compared to
the other subcategories. So now the question
is we have five years of data inside our data source. Was it always like
this that the copyers was the worst or something
changed with the time? Now, in order to
compare the years, we can add the years to the view in order to
compare those informations. We have already the year
prepared from the last time. So we have the
order, date, year. Let's just bring it to
the view, to the columns. Now if you check the data,
it's very interesting. If you focus on
the Cobyers again, you can see that in 2018, 2019, the performance
was really good. Even it was one of the
best performance 2019, it gets this light red, but something changed in 2020. From 2020 and forward, you can see it's
always dark red. There is like change in maybe the resources or in
the inventory management, we can see it is one of the worst performance compared to the other subcategories. With that, you can compare
the years as well together to understand whether it was always like this or something changed. As you can see, using the
visualizations, the coloring, and as well those
functions that we has in Tableau to
manipulate the dates, we can uncover those
trends inside our data. Maybe it's really hard to find it from the raw data, right? But if you bring everything with colors and everything
in the visualizations, it's going to be
really easy to detect. So this is exactly the power of vasulizations at those
functions. All right everyone. So with us we have
learned how to add and subtract dates in Tableau. Next we're going
to talk about two functions today and now.
141. Tableau | TODAY & NOW: Now we're going to learn
about two cool functions in Tableau today and now in order to get the current dates or the current date
and time, let's go. All right guys, one of the
very famous use case of the today function in Tableau is to make something like this. You can highlight
individualizations about the current
date in the view. So we can see here
like a separator in the visualizations with
the current date of today. And with that you can
draw the attention of the users by highlighting
one of those parts. Now let's go and understand quickly what is today function. All right, so we have those
two functions today and now. They are the easiest and
the simplest functions in Tableau that will not manipulate
or transform anything. There is no concept behind them. They will just deliver for you the current date and
time informations as you execute them. So for example, we have
the first one that today it does not
need any argument. As you can see,
it's very simple. The output can be a date. So you will get the
current date informations. Now we are, as I'm recording
at the end of my 2023, but if you are
interested to have as well the time information you have to execute now
no argument inside it. You will get date and time. So as I'm recording it is 06:00 P.M. 10 minutes and 40 seconds. So that this is about
the two functions. Let's go back to Tableau
and start practicing. When do you use them? All right, so now we're going to
see how we can use today function in
our visualization. So the first thing is to
create the calculated field. So let's go and
create a new one. And we call it today, then we need the function
that's called today as well. As you can see, it's very easy. We don't need to
add anything else. And by the way, this is always the first calculation
that I always create in each new data source without knowing the
requirement or anything. I just go and create
this one because I'm sure that I end up
using this function. So it's really one of
the first things that I usually do for each
new data source. Let's go and hit, okay.
Everything is fine. As you can see, we got
it on the left side as a new dimension with
the data type date. Let's check the current
information so we can bring into the view table,
can convert it to a year. So I have always to
switch it to exact date and then to discrete
in order to see the value. And as you can see, we are
at the end of my 2023. So now it's very interesting
in which year you are now checking the video and
following me in those steps, okay, So this is
how you can create the today function in Tableau. Now we're going to use
it in a reference line, in one view in order to show you how powerful
this function and we can create a view
about the number of orders over the shipping date. Let's go and create
it. I'm going to remove that today from here. And then we can add
the shipping date from the orders, the column. Then let's take the number of
orders, the orders counts. Let's take it to the rows now. Instead of having the years, I would like to have months. I'm going to do now
a quick format. Let's go to the
field and then we're going to go and pick
this one month. Let's click on it and the visualization
type look as well. Good. Now let's go and
create a new reference line. In order to do that,
we're going to go to the axis over here,
right click on it. And then we have here the option of a reference line here. The most important
thing to customize is the value of the
reference line. I would like to have
the value of today as a reference line to indicate the current information,
the current date. But if we go to the
values over here, you will see that I
can either create a new parameter or I can
use only the shooting date. And that's because our new field today is not yet in the visual, so we have to add it to the
visual in order to do that. We can close this first. Then we take that today and drag and drop it in the details. But we are not there yet because Tableau did convert
it to a year, and I would like to have in the reference line the
exact date of today. In order to do that,
we're going to convert it to exact date, radically connect and we have here the option exact dates. This is the requirement to
add it in the reference line. Let's go and add again
the reference line. And we go to the values. Let's check, Yeah, we
got the today value, let's select it. And then hit. Okay, so now here on
the right side we got a very nice reference line indicating of the
day of to date. But still there's like
a problem, right? Because all of the
data is behind the reference line because
the data is a little bit old. Now, in order to make
it more interesting, I'm going to add two years to the shipping date to make
the visual look better. In order to do that,
as we learned before, we're going to go and create
a new calculated field. Let's call it shipping date. Plus two years. Here
we can add a date. Add first, we need
the date part. So we are saying plus two years. We are talking about years. The interval going to be two and the date going to
be the shipping date. All right, with
that we are done, the calculation is valid. Let's click Okay. So we have
it now on the left side. And what we're
going to do, we can replace it with the old value. Let's just remove chipping date and get the new
one to the rose. We're going to do
the same steps, so we're going to
convert it again to month. Let's do that now. As you can see, we
have values for 2024. 2025. Let's add again
the reference line. Right click on the axis. Add reference line.
Let's go to the values. Let's select it today. Now we've got a very nice cut in our visual in between our
data to show the past, today and the future. Now we can go and add a
little bit customizations just to make it look better. For example, as you can see, we have a label over here
for the reference line. It says minimum Today, I would like to show immediately the value of the current date. In order to do that, right click on the line and then go to Edit. Then change the label over here instead
of the computation. Let's change it to the value. With that, as you can
see on the right side, we get immediately the
current value of today. The next step, I
would like to add some coloring to
the reference line. Right click on the reference
line and let's go to format. Then we have here three
informations to customize. The first one is
the line itself. Then fill above, that means
all the information on the right side fill below going to be all information
on the left side. For example, let's
start with the line. I would like to have a dot and
as well read the opposite. I'm just going to
make it to the 100. Now the next value is going
to be the fill above. I would like to
highlight it with green. Let's go and pick
color green over here. And then the next one
can be the pillow. You can leave it like white
or you can make it like gray in order to show
this is history. With that, as you
can see, the visual can look more professional. So we are highlighting
the future and the history is like
grade out. So that's it. With a small
function in Tableau, like the Today function, you can create amazing dashboard and visuals for your users. And this is one of the
most common use case of the Today function in Tableau to highlight the
data. Okay everyone. So that's it for today
and now functions. With that, we have
learned all the use cases for the date
functions in Tableau. We have covered around
ten functions in Tableau. Next we're going to
jump to the next group, we can learn about
the null functions.
142. Tableau | NULL Functions: ZN, IFNULL, ISNULL: Now we're going to
focus on another group of functions under the category row level
calculations, the null functions. The main purpose of the null
functions in Tableau is to handle and manipulate the
missing values in our data. The nulls, we can have
missing values like everywhere in text,
dates, numbers. Any field in our data source can have like missing values. Why handling the missing values? Handling the nulls is a very important step
in the analysis. And that's because
of two things. First, the calculation accuracy. Null values can affect the calculations and the
aggregations in the results. Null values in our data, and we ignore it, we don't
do anything about it. What can happen? We can have incorrect calculations
and corrupt results. The second reason is to improve the data quality and to
achieve completeness. Identifying the data
gab that are wrong in the data entry
and having issues in the data collection can help
the overall data quality in our data and can improve as well the completeness in
the data visualizations. That's why the null functions in Tableau are very
important to have accurate and correct analysis in the data
visualizations as usual, let's understand the concept
then we can practice. Let's go, let's go and understand
those three functions. Zen null is null in order to handle our
missing values as usual, we're going to go with
the example because it is the best way to
understand those functions. All right, so now
we're going to have four customers and their sales. As you can see, only Maria has a missing value in the
sales. We have here a null. In order to handle this null, we have the first function in Tableau stands for zero nulls. It can replace the
null values with zero. It's very simple. If you use now the Zen function
for the sales. For the first value we will
not change anything, right? We will get exactly the same
value but for the next one. Since it's a null, it's going to replace it automatically
with zero. The next two
customers, we will get exact values because
they are not nulls. So as you can see, very simple, we are just replacing the
null values with a zero. So this is a very quick
way to replace the nulls. But here the problem is we have no control what
we are replacing. So here we cannot
specify something else. We will always get a zero. In order not to
specify our value, we can use the second function
that we have in Tableau. If, if null, it can replace the null value with a
specific value from us. If you use this
function on the sales, it can has the following syntax. It needs two arguments. The value that we want to manipulate and the
value that we specify. This example, I'm going
to specify it as zero. It doesn't make
sense because we can use but just to
show you that we're going to get the
same results so you can go over here and
put anything you want. So for the first
customer, we're going to get exactly the same results. For the second customer, we're going to get
again zero because we specify that we have
the control on that. And then for the
last two customers, we're going to get
exact results. And here the output
is a number because the field that we want to
manipulate is a number. But let's say that we take another field which is a string. The output going
to be as well as string here is exactly
the difference between z in and if nal z
in accepts only numbers, but the iphnal accepts any
field from your data source. For example, let's
say that we have the countries John has
no value in the country. Same for Martin. We have
only for Maria and George. Informations inside
the field country. Here. We cannot go
and use the z in function because it's
not number, it's string. In order to manipulate those values or to
replace the null values, we're going to go
and use the Ip Nal. The syntax going
to look like this. If null country, then we have the abbreviation
of not applicable. The output here going to be a string value for
the first customers. We're going to
replace the null with the next one is
going to stay the same because there is
nothing to replace. The third one we're going to
get as well, not applicable, and for the last one
we will get France, so nothing to be changed. This is exactly the
differences between the null function and the
z in function in Tableau. Now we're going to go to
the last function is null. Sometimes we might be in a
situation where we want to check whether the field
has null values or not. So we don't want to
do any actions yet, we are just checking, right, the null in Tableau
going to return true if the value is null
and falls otherwise. That means if there is no value, if we have missing value, we can get true, there is a
value, we will get false. So the output of this
function is going to be with the data type bullion
with only two values, either true or false. So let's check the example
or the syntax in Tableau. It's going to accept
only one argument, the country, and that's it. So the question for the first
customer, is it a null? Yes, it's null, so
that's why we're going to get true for
the next customer. Is it a null in the country? We'll know, so we're
going to get false. The same for the third one,
we're going to get true. And the last one we're
going to get false because we have a
value in the country. So that's it for the is null. So we have three functions, three tools to manipulate or to check the null values
inside our fields. And they are really useful
to improve the quality and the completeness of your visualizations.
So now let's go. Blow and start practicing them. This time we're going to go
to the small data source. Let's check the
order information. So we're going to
take the order ID, and we're going to take
this time the profit. Drag and drop the profits on the ABC over to
see the values. Now if you check our data, you can see that the order seven don't have any
profit informations. And as well the
order ten don't have anything we have here
missing data, we have nulls. Now let's do something
about it and fix it. Instead of having null, we have to have zero. Here we have two
functions to do it. Let's start with the first
one, the zero nulls. Now we're going to
fix it and create a new calculated field. We're going to call it
profit n the syntax. Start with the function and
it needs only one argument. The field that we need to fix, it's going to be the profits. With that, we are changing
all the null values to zero. Again, in this
faction, we don't have control to change the
value to something else. It's going to be always zero, the calculation is valid,
everything is nice. Let's click Okay. And as usual, we're going to get a new
measure since the output is going to be as well,
the profit informations. Drag and drop this new
information to the few, and now we can see
on the results, all those values going
to stay the same. Only we are
manipulating the nulls. We are replacing the nulls
with zero here as well. For the Udoumber ten we have
null, now we have a zero. It's and quick fix. All right, so now we
might say, you know what, why we are making
all those efforts to replace those missing
values with zero. So what is the big deal? I could just leave
it as a null and the users might accept it.
Why we are doing this? Well, it's not only the
visual going to be better, but also having missing
values going to bring wrong and
inaccurate aggregations. Let me show you what
I mean. Let's just remove the order ID away. Now you can say, okay, we
got the same numbers, right? We got the same aggregation. So everything is
accurate and fine. Well, not exactly. This
is only for the sum. Now let's go and switch
them both to the average. We're going to go over here
and switch it to average, and we're going to do the
same for the corrected one. Now I'm going to just
make the headers wider to see the values. Now as you can see
now we are getting different values with
the Z in function. We got different average
from the original data. And that's because in
this average we are not counting the orders with the missing values
with the Z in. We are counting now the orders
with the missing values. That means replacing the
missing values with zeros. We will get accurate results at the average in the aggregations
compared to the old one. That's exactly why we go and replace the missing
values with zeros, especially for aggregations
and calculations. All right, that's why we do it. Now let's go and try
another function. We can use the Nal in order to replace the null
values with zeros. And now I'm going to just
bring the order ID to view, to see all the orders. Let's go and create the
new calculated field. And we're going to call
it profit if null. And the Centax
starts with if null. And it needs two informations. The first one going
to be the field that we want to manipulate, so it's going to be the profit. Again, for the next information, we have to specify which
value can replace the null. In this example, we're going
to stay with the zero. The calculation is valid. Let's hit okay, and we got again our new calculated field. Let's bring it to the view
and check the results. As you can see, it
is identical to the z n for the
order number seven. Instead of null, we got zero. The same for the ten
we got as well zero. In this situation, if we want
to replace it with zeros, I would go with the z n since it's just
faster to write it. Now let's move to
the next scenario. We want to replace the
nulls with the value one. This time we cannot use the z n because can automatically
convert it to zero. We're going to stick
with the null. Let's go and edit our
calculation instead zero. Here we can specify one. Let's go and hit okay. Now we can see instead
of having zero, we have the value one. Instead of null we have one. This is the advantage
of the Enal. We can control which value going to be the
replacement for the null. All right, the next advantage of the E Nal that we can replace not only number values we can replace as well any
other data type. Let's take an example.
We're going to go to the customers and let's get the customer E
mail to the view. As you can see here,
we have some nulls. We don't have all the E
mails from all customers. But now the task is to
replace those nulls with non. Let's go and create a
new calculated field in order to replace
those values. Let's call it customer email. If null, and the syntax, again null, it accepts
two arguments. The field that we
want to manipulate, it's going to be the customer
e mail, this one over here. Which value we're going to use in order to replace the nulls? It's going to be the
unknown, That's it, the calculation is valid, so we can replace all the
nulls with this value. Let's go and hit, okay. We have again here a new
dimension in our data source. Let's grab it to the view
and check the values. Now if you just compare
those two columns, you can see instead of null, we are getting Unknown the same here and the
third one over here. And the others will
not be affected because we have a value
inside the field. As you can see, it's really
nice and quick way to replace those bad
nulls in the view. That's all for the Nal. Now let's check the last
one we have is null. The null will not replace
the values with anything. It's just to check whether
there is a null or not. Let's say that we want
to check whether in the field profit
we have any nulls. In order to do that, we're
going to go and create again, a new calculated field. Let's call it a profit is null, and the syntax for
that is very easy, is null and it accepts
only one argument. It's going to be the field
that we want to check. So we are checking
the field of profit. The calculation is valid and that's it. It's
really simple. We are checking whether this
field, any nulls inside it. The output can be
either true or false. It's going to be a
pullion. Let's set, okay? And as you can see on the
left side we have a new field with the data type pullion because we have only
true and false. Let's drag and put it
on the view over here. And here we can see
quickly all those orders is false because we have a
value inside the prophet, but here we have a null, that's
why we are getting true. And here again we have a
true that we can check immediately whether we have
nulls inside our data or not. So let's go and show
it as a filter. This is what I usually do
if I see there is true, I'm interested to see those values so I can see, all right, we have two orders where we have nulls inside
the value profit. This is really quick
way in order to check whether we
have any problems, any nulls inside our fields in order to make plan
what we can do about it. But here in the
small data source, it's really easy
to see individual like all the orders, we
have only ten orders. But imagine you have
thousands or millions of orders inside your
data individual. It can be really hard to see. Let's take an example in the big data source, we're
going to go over here. Take again the order ID as well. Let's check, this time the
sales drag and drab it. In the view as you can see, it's really hard to check now in the view whether
we have nulls or not. Instead of that we
can do a check. We're going to go and create
a new calculated field. Let's call it sales is null. We can use the function is null. This time the field
is going to be sales. We are checking the sales. Let's go and, and
now we're going to show this field as a filter. Now in the filter, we can see immediately that we have
only one value falls, so we don't have
true, that means we don't have any nulls
inside our data. So this is a very quick check inside our data to see
whether there are nulls. Instead of just like scrolling down and checking
all the orders, that's why we need
the isnull function. So with that, we've covered
all the three functions that steal and handles
with the null. This is very important to
improve the quality of your visualizations and to bring accurate data in
the aggregations. All right, so with that,
we have covered everything about how to handle
the missing value, the nulls, in Tableau. Next we're going to move
to another group of functions, the
logical functions.
143. Tableau | Logical Functions: IF, ELSE, ELSEIF, IIF, CASEWHEN: Now we're going to talk about
the last group of functions under the category row level
calculations in Tableau, we have the logical functions. The main purpose of the
logical functions in Tableau is to make logical decisions
based on conditions. Here we have two use cases. The first group is the
conditional operations. Here we have like LF,
case win, and so on. The main focus here is to
create conditional logics and make decisions based on those conditions in order
to manipulate the data. And the second group is
the logical operators. Here we have three
operators and, and the main purpose of
this group is to evaluate and to combine multiple
conditions in Tableau. Now let's go and focus on the first group, the
conditional operations. And as usual, first we have to understand the
concept behind them, then we can practice in Tableau. Let's go. All right everyone. So now we're going to do D, dive in those logical functions in order to understand how they work and how they're
going to be executed. And now we're going
to start with the symbolist form
of the statement, where we have only
one condition. In this example, the
condition going to be, if the sales is higher than 1,000 then we want
the value high, otherwise we end happen. Now let's see the flow charts on how this going
to be executed. We start first with
checking the condition. Here we have always two ways, either false or true, if the condition is fulfilled, if the sales is higher
than 1,000 then we go this path where we're
going to have the value high. If it's true, we're going
to get the value high. And then everything
ends the other path. If the sales is not higher
than 1,000 then it's false, then we're going to
escape everything. That means nothing can happen. Let's have the
following example. Let's say that the
sales has the value 1,200 Now first
we're going to check the condition is
the sales is higher than 1,000 Well, yes, it's true. What can happen? We can
execute the high and it's end. And if you're looking
to the chart over here, first we are asking
the question, is the sales higher than 1,000 The answer is
going to be true. So we are taking the green path, This one where we can
execute the high. Let's take another example
where the sales equals to 700. So we start over here again. We ask the question,
is the sales higher than 1,000 This
time it's not true, so it does not fulfill
the condition. And we're going to go with
the path on the right side. What can happen?
Nothing can happen. The high value will
not be executed. And in the output, we're
going to get the value null because there is
nothing can be executed. It's really simple, right?
You are asking always the question that
could be answered with yes or no, true and false. You have always two
paths, each condition. This is the simplest
form of the statement. Let's move to the next
level where we're going to have FL statements. Now we're going to stay
with the same condition. If it is fulfilled, then we're going to get
the value high. But let's say this time if it is not fulfilled,
it is false. I would like to get a
value instead of null. Here we can add the keyword
L. What we're going to do, we're going to add between F and end and L statement to say, okay, if it is not fulfilled, give me the value low. Let's check the flow chart,
how it's going to look like. We start first with
checking the condition. If it is true the first path, we have the value high. But if it is not true this time, instead of just jumping
immediately to the end, I would like to get using the L. So that means the output
of the FL statements, it's going to be always a value, either high or low. We will never get a null.
Let's take an example. Let's say that the sales is 1,200 It's going to
fulfill our condition, so we're going to get
the value high and the program can end on
the right side as well. The same thing. What can happen? We're going to check the
condition and sense is true. We're going to get the value
high and the program ends, the output going to
be the value high. Here, it's like the last one. But now if the sales
equals to 700, the condition is not fulfilled. And now instead of jumping
immediately to the end, it's going to jump to
the S L statement. So now let's check
another value where the sales equals to 700. The condition will
be not fulfilled. So it can fail because
the sales is not higher than 1,000 So what
can happen this time? We're going to execute
the L statement. We will not jump
immediately to the end, so we're going to go
to the Ls and then we can execute the
L's In the chart, we checked the condition and we took the right path
where it is false. So now once we are
at the L statement, it's not like the F here. We will not have any condition.
We have only one path. So we can execute the low
and the program can exit. So what can happen? We will just get the value low and we end. So the output can be the low value instead
of having nulls. So L will be always executed if the conditions
are not fulfilled. So that's it for the L
statements, it's very simple. Now we're going to go to the
next level where we want to add multiple conditions
in our statements. All right, so now
we're going to talk about the LSF statements. We can use it in order to add multiple conditions
to our statements. So far in the previous examples, we worked only with
one condition. We are checking with her,
the sales is higher than 1,000 and if we are
using the FL statements, we're going to get
either high or low. Let's say that we want to
introduce another condition in our statements to get
the value of medium. So now we would like to add
a new condition between F and Ls exactly after
the F statement. But now we cannot go and
use F again as a keyword. Instead of the add
anything after the F, we can start using
the LSF statements. Adds more conditions.
For example, we can add the following
condition in between. It's called LF. The sales is higher than 500, then we can get
the value medium. That means in the
whole statements, we can have only one
and only one else, but we can have multiple LF in between if we want to
add multiple conditions. Now let's see how the workflow
is going to look like. We start as usual with the first condition
in the statements. If it is true, what can happen? We can get the value high
and everything can end. Now if that condition is
not fulfilled in the first, we're going to jump to
another condition in the LSF. Here we have another
condition where we can check if the sales
is higher than 500. And here we have, again,
two ways out of this. Either it's going to be true, either it can be fulfilled,
so what can happen? We're going to get the
value medium and then ends. And the other one,
if the condition is as well not fulfilled, then we're going to go and
execute the L statements. As usual, the L statement
does not have any condition. It's going to just execute
the value and ends. Let's see a few examples in order to understand
how this works. The first one going to
be the sales equals to 1,200 We are checking
now the F condition. As you can see, it's
going to be fulfilled. We going to get the value high and that's it. So
what's going to happen? We're just going to
skip everything to the end if we're
checking the workflow. So we're going to check
the first condition and we will take this pass. Everything else is going to
be ignored and will executed. We will just get the
value high at the output. All right, now let's
take another value, the sales equals to 700. So we are at the
first condition. It will fail, so we will
not get the high value. Instead of that,
we're going to jump to the next LF statement. So we are now at the right path. The true path can
be deactivated. So we have here
again another check. So we are checking, is the
sales higher than 500? Well, this time it's going to be fulfilled. So what can happen? We're going to get
the value medium and then the program going skip. So with that, we
are at this path where we get the value
medium as an output. So this means again that the L statement will
not be executed. All right, moving on
to the next example where the sales equal to 350. Again, we are at
the first check, 350 is not higher than 1,000 that's why
this going to fail. Then we're going to
jump to the next one to check whether it's going
to fulfill this condition. And the sales as well here, not higher than 500. So this can fail as well. So since now both of them are
failing, what can happen? We're going to go
to the default. The default value is the Ls, so this going to jump to
the Ls and we will get the low value from our statements and this
is going to be executed. Let's check the right
side on the workflow. As you can see, we are the
first condition it failed. We go to the second
one, it failed as well. Then we go to the last option that we have to
the L statements. We will get the value of low. That's all about
the LSF statement. If you have a third condition, you just can add it after
the LSF or before it. With that, you can add multiple conditions to your statements. And understanding the
logical workflow behind those statements is very important to understand
those functions. All what you are
doing here is we are evaluating
different conditions. And based on the evaluations we will get in the output
different values. In this example, we have
three possible values, high, medium, and low. All right, the case
win statement, very similar to the
statement here. We're going to evaluate as well, multiple
logical conditions. And based on our evaluation, we will get an output value. Let's take an example in order
to understand the syntax. It starts always with case, then the field that
we want to evaluate. Now we're going to
go and evaluate the values inside the country. The first condition
is going to be like this. We can write win. Then if the value is
Germany inside the country, then the output going to be
the E. Here we are trying to make like in the output
abbreviations from the countries. Now we're going to
go and make another condition for another value. Inside this dimension, we can evaluate the
value of France. If it is equal to France, then can be R. Then moving
on to the next condition, we can evaluate the US value
inside this dimension. If it is equal to this value, then the output should be US. As you can see, using the
case when we are evaluating the members or the
values of a dimension. Here we are here. In those conditions, we
are evaluating a scenario. What can happen if the value of the country is
Germany and so on. So far we have three conditions. If you are done and
you would like to have a default value if none of those conditions
are fulfilled. If the value of the
country does not fulfill those three
conditions, what can happen? We're going to go and execute the L statements and at the end we're going to
have as well and end. As you can see, it's really easy to read and as well
easy to write. All right, now let's go and
have an example in order to understand how the
execution can be done. So let's say that we have the Germany value
inside the country. Now as the code can be executed, we can start from top to bottom. So that means we can first
evaluate the first one, it's going to be in Germany. Then DE, as the
values are matching, we will get the value
DE at the output. And the code going to
skip everything else, so we will not check
France, USA, and so on. So the code is going
to go to the end and as output we're
going to get DE. It is very similar to
the FL statement, right? So let's take another example where we have France
in the country. Here we start moving from
the top to down again. The first condition
can be checked. In Germany. Then DE, this time we don't have a match. Here we have France
and here, Germany. It's going to fail.
We will get false. That means what can happen? We're going to jump
to the next condition to check and evaluate
the next value here. We're going to check again
when the value is France, then FR, this time
we have a match, so we will get it true. And with that, the
application going to like skip the other
conditions to the end. That means in the result
we're going to see FR. Now let's move to the
last example where we can evaluate the value
Spain in the country. What's going to happen
again? Top down. This time none of those
conditions going to be fulfilled, right from the first one. We're going to jump to
the second because it has falls as well from the
second to the third. It's false means we're going to go and execute the L. L can be executed if all conditions are not fulfilled in the output, we will get the NA
not applicable. It's very similar to
the FL statements. Now we're going
to go and compare all those stuff side by side. So now we're going to go and compare three functions,
F statements. I, IF case twin. I know that we didn't
talk about the IIF, but now we're going to check
the syntax in order to understand the
differences between it and the F statement. Let's start with the first
one here, the syntax. We have multiple conditions.
We have two conditions. We have sales higher than 1,000 then high LF sales
is higher than 500, then medium L low End with that, we are evaluating multiple
conditions in one statement. Now let's move to the next
one. We have the IIF. Iif is very similar
to the FL statements. We will get the same output, but we write it in different
and easier syntax. Let's see the syntax. As you
can see, it's very small. It starts with the IIF, then the condition itself. So the sales higher than 1,000
Here we have two outputs, whether it's false or true. The first one is about the true. If the condition is fulfilled, we will get high value. But if the condition
is not fulfilled, we will get the low value. Here we're going to write what
can happen if it is false. And here we're
going to write what can happen if it is true, if we compare to
the FL statements. Easier to write and as well shorter here
we don't have like keywords like ls or at the end we don't
have the keyword end. It's really short
and quick to create. But of course, we can
evaluate only one condition. Now we can move to the case
win as we learned before. It can evaluate the values, the members of a dimension. Here we're going to
evaluate the country. Then we have
multiple conditions. If none of them is fulfilled, we're going to go to the L statements and
then we have an end. Now let's learn the main
differences between them. The first one is
about whether it's going to support
multiple conditions. As you can see in
the FL statements, we can add many
conditions as we want. It supports multiple conditions. The IIF supports
only one condition, the seen as well supports. Now let's move to the next one. We're going to talk
about whether it's going to support
multiple fields. The FL statements can
support multiple fields, so we can have in the condition not only the sales but something else like
the country as well. The FL statements
support multiple fields. The same for the IIF. It support as well
multiple fields. But in the case win, it
supports only one dimension. Here, we cannot evaluate multiple dimensions in the
same case reinstatements. Here only we are talking
about the country. We cannot add any other fields
inside these statements. Here we have a limitation in the case reinstatements
compared to the other two. Now let's talk about
supporting the data types. The FL statements and the IIF, both them they support
and in data type, that's why I said here it can evaluate multiple fields here. We could have a
dimension measure any data field that you
have in your data source. It could be evaluated
inside those conditions. But the case when here we
have another limitation. We can evaluate only string
values, only dimensions. Here we cannot go and
evaluate, for example, the sales or profit or a
quantity, any measure. We cannot use it inside
the case when statements, it should be exactly a string. We cannot even use,
for example, a date. The order date here, the field should
be a string value. Now let's go and check the
main advantage of each method. The first one is,
as you can see, we don't have any limitation. The IIF here, the advantage is easy and quick to
write in the case win. Here we have again the advantage of easy to write and to read. If you look at the case
win statements and to the FL'sessments you
can see the case win. It's like organized,
it's easy to read. It has like a flaw as
compared to the FL's. Here we have a lot of
different keywords and it's not that easy
like the case win here. My recommendation for you
is if you are evaluating only one condition with
the output of two values, then always use IIF. It's very quick to create. But now if you have
multiple conditions and you want to evaluate it, then think about the case win. Is it like data type string? Are you evaluating
only one field? If that's the case,
then use case win. It's easier to read
and as well to write. But if you are
talking about fields and not only shrink values, then you have to go
to the FL statements. Always start with the
IIF, then the case win, and then if you don't
have any other option, go to the FL statements. All right, so that's
all about those Sods. We're going to go now and
practice in Tableau. All right. Let's go to the
small data source. We're going to go
to our customers. Let's grab the first name to the view and as well the
country informations. Now the task is to create
country abbreviations. Short cuts from the
original values that we have inside the country. In order to do that, we can use the FL statements and we're going to do
that step by step. Let's go and create first
a new calculated field. Let's call it country If now we're going to
use the keyword if. After that we have to
specify our condition. The first condition going to be, if the country
equals to Germany, then the abbreviation
going to be DE. Let's create that.
If the field country equals to the value of Germany, makes sure to write
it exactly like our capitalized because Tableau
here is case sensitive. Now what happens if the
country equals to Germany? We would like to see in
the output the word D, E. If it is true, we're going to get the
E. If it's not true, then let's try the first
one that we just exit. We don't have any L statement
or any other condition that this is the simplest
form of the statement. Let's go and hit, okay. Now as usual we're going to get a discrete dimension in the data source pain with
the data type string. Because the output is a string, we have the abbreviations. Let's drag and drop on our
view to see the values. All right, so now
let's go and check the values for the
first customer, you can see that the value
is not equal to Germany. It is not fulfilling
the requirements. We will get null. The same
thing for John as well, USA, not fulfilling
the requirements. We will get null as well. For the next two customers, you see they fulfill the requirements and
their condition, that's why we will get the
value DE for both of them. For the last customer, Peter, you can see the value is not
fulfilling their condition. We got to get null.
As you can see, we are getting only one value, the otherwise it's
going to be null. All right guys, now let's
go to the next step. And I would like to get
rid of those nulls. I want to see a real value in the visualizations If the
condition is not fulfilled, I want to see the value
not applicable in A. Now in order to do
that, we have to use the L statements in
our calculation. Now let's go to our field, and instead of changing the calculation
inside this field, I would like to duplicate
it and make a new one. Let's duplicate it and
then edit the new one. I'm just going to call it if L. Now we're going to have
the same condition again, if the country equals to German, you can get, otherwise
we will not skip. Otherwise, we can add
the L's statements. It's going to be
always before the end. After that, we don't
add any condition, we just have to add the value, the value if the
condition is not valid to be not
applicable. That's it. That's means if it's true
we're going to get the is, then we're going to get
the not applicable. Let's go and click Okay. And we're going to go and check the values as well in the view. Just make it a little bit bigger to see those informations. Now as you can see,
instead of having nulls, we are having now a value which is really better
for the visualizations and as well for the
user experience to have value instead of nulls. Nulls is always
ugly in the views. And with that, we're going to
control which value can be presented to the end users if the conditions
are not fulfilled. So now, as I recommended before, if you have only one condition where the output is
only two values, then the best way is to do IIF. Let's go and create
it. We're going to create a new
calculated field. We're going to call it country
IF, let's see the syntax. So it's start to the
keyword IIF here. As you can see, it
needs three arguments. The test, it's going
to be the condition. What can happen if the
condition is fulfilled? So, we have to specify it in the second argument, the third one. What can happen if the
condition is not fulfilled? The condition is if
country equals to Germany. This is the condition. What
can happen if this is true? Then we're going to,
then the next step is to define what will happen if the condition
is not fulfilled. The country is not Germany. It's going to be,
as you can see, it's very quick and very fast
to create such a condition. And compared to
the L's and so on. So this is the quickest way in order to create
such a condition, let's go and hit Ok,
and check the results. With that, again, we're going
to get a new dimension. Let's drag and drop it over here on the view to
check the results. Just going to make
it a little bit big. As you can see.
We're going to get the exact result
as L statements, so the first two countries are not fulfilling
the condition. We're going to get the text, two customers, they
are from Germany, we're going to get the E, and the last customer is not
from Germany that we get a. This is the magic of the IIF. Not a lot of people
use it actually. It's not that common to be used, but it is very nice way to quickly create
conditions in Tableau. I totally recommend
you to use it. All right guys, so now
we're going to move to the one more step
where we're going to add another condition. So we don't have only one. We can have multiple conditions. That's why we
cannot use the IIF. We have to go back to
the FL statements. So let's see how
we can create it. I'm going to go and duplic it
again, one of those fields. So let's go and do that. And then let's go and edit it. I'm call it statements. We're going to stay with the
same information is right, the first one we are
checking the Germany, so this is the first condition
and L going to be A. Now we're going to go and add a new line between
the F and the Ls. And we're going to add
a new condition by adding the key word LF used. Like the statements, we
can write our condition. If the country this
time equals to, let's say France,
then what can happen? We can have the abbreviation. That's it, We have added
our second condition. As usual, we start the
execution from top to bottom. The first condition
to be checked is if, whether the country
equals to Germany. If it is not correct,
then it can jump to the. Let's go and it to
check the results. So let's go and grab it from the data pin and
drop it on the view. Now we can see that there is one customer with a new data. As you can see,
George from France, we got the abbreviation of FR, and that's because the
country equal to France. And with that, we are fulfilling
the second condition. The USA for John and bitter, they still don't fulfill
any of those conditions. It always be executed from the ills and Maria
and Martin can be executed from the
first condition where the answer going to
be DE. So that's it. Now we're going to go and
add the final step where we can add the third condition
for the country USA. Because we still
are getting those not applicable for
those two customers. I'm going to go to the
same field this time, I will not duplicate it,
so let's go and edit it. And we just have to add
one more condition, right? So I'm just going to
copy those stuff and then as the next condition
it's going to be as well, LSF country equal
to this time USA. Then what can happen if
this condition fulfills? We're going to get
that abbreviation US. So you can see it's
very simple to add one more condition
and the LSF. Let's okay. So now we can see
in the results, all those customers
that come from USA, they have now the
US abbreviation. And with that, we have covered everything with conditions. And none of those customers
can be executed from the L. So we don't have the NA anywhere in the output
which is really nice. And now we can see in
the view very nicely, how we started with the
simplest form of the statement, and we end up with the complete
form of the F statements. Now next we're going to
solve the same task, but this time using
the win statements. All right, so now
let's go and create a new calculated fields. We're going to call it
country win, then the syntax. Start with the case,
then we have to specify the field that
we want to evaluate. It's going to be the country. Once we do that, we start
defining now our condition. The first condition going
to be the Germany value. When the value
equals to Germany, then what can happen? We're going to have the
abbreviation DE. That's it. The next condition going to be when country equals to France, then the abbreviation
going to be F, R. And we're going to go
to the last condition, when the country equals to US, then the value going to be US. That's it. You see
how quickly we defined three conditions
using the case win. It is very logical and as well very easy to
create right now. If none of those
conditions are fulfilled, let's get the not applicable
and we have to end it. That's it. As you can see, the calculation is valid and it's really easy to
read as you're right. So it is everything
like structured. I like a lot using case win statements and
compared to the FL's. So that's it. Let's go now and hit okay to check the results. And now we've got a new
dimension, as usual, from the calculated field, let's put it in the view
to check the results. So as you can see, we're going
to get the same results. But in this situation,
for this task, I'm going to recommend
you to use the case win, since as you can see, it's very easy to write
and as well to adjust later or to add more
conditions if it's needed. So with that, we have
learned how to use all those logical operations in order to create a
new logical conditions. All right everyone, So
I'm going to show you a very common use case
that you might find it in many projects where you're
going to go and create the colors of the QB eyes using the ecological conditions. Let's go to the big
data source and we need the subcategory
from the products, as usual, to the rows. And then we need the
sales from the orders. Let's put it on the columns. And then we're going to sort it, we're going to add the labels. And now we need
color for this KBI. Let's go and create our
new calculated fields. We can call KBI colors. And the logic can
be the following. If the sum of sales are
higher than 200 Ks, I would like to see
the green color. Anything between 200 K's and 100 K is going to be
the orange color. And anything below the 100
K, it's going to be red. So now we have to decide on the method that we want to
use in our calculation. As I recommend you always
start with the IIFow. In the logic, we have
multiple conditions, we cannot use it. Iif is only suitable if we
have only one condition. Iif is away. The next one we're going to talk
about the case win. But since the
conditions are based on the sum of sales, it is integer. We cannot use the
case win because case wind can accept
only string values. This is as well a way we are left only with the
FL statements. That's why in this
calculation we're going to build it
based on the FL's. Let's go and do that. We can start the context
over here with the F, and then we have to specify
our first condition. Anything higher than 200
K's, it should be green. So now we are talking
about the field sales. But in the sum, because indivisualizations we
have the sum of sales. So if the sum of sales is higher than 200 K's,
then what can happen? We can have the color green. So that's it for the
first condition. Now we have to specify the
condition for the orange. Anything between
200 K and 100 K, it should be orange. So let's go and
specify that L again, we're going to have
the same field, sum of sales higher than 100 K, then it's going to be orange. So now you might
say, you know what, In the condition that you just say it has like two
boundaries, right? Higher than 1,000
and lower than 2000. Well, the first boundary, we have it already with the
first condition checked. If it is higher than 200 K's,
it's going to get green. And this can be anything
going to be checked. In this case, it is going
to be lower than 200. That's why I specified here
only the lower boundary. That's it for the orange. The last one is going to be, if the sum of sales is lower
than 100 K, what can happen? We're going to get
red. Let's go and specify that we're going
to have another LF, sum of sales and lower
or equal than 100 K. Then it's going to be red that we have covered the third condition,
the third color. And we covered
everything. We covered all possible values
that could happen. That's why it doesn't
make any sense to make an L statements. We just can go and end it. Now let's check,
Everything is fine. Now we've got an error. I think I missed
here to close it. Now let's check it again. The calculation is valid. That's it. We have three
conditions to three colors. Let's go and hit Ok. All right, now we have our
dimension over here. We're going to use it
for the coloring, right? Let's track and drop it
on the colors over here. Now, as you can
see, our colors are splitting our view. Tabloid. Got it, almost correct.
So we have a orange, red, but this one is not blue. Let's go and change it.
We're going to go to the colors then. Idiot colors. Now instead of green as a blue, we're going to have
it as a real green. Let's go and hit Ok. So that we got the
colors of our KPI. As you can see, all those
subcategories with the sales are higher than 200 K.
They are all green. And now anything between
200 K and 100 K, you can see all of
them are orange and anything below is red. So as we can see, we can do a lot using those
logical conditions. We can use it in order to
create the coloring in Tableau. We can use it to create a new informations like
in the country, abbreviations that are very
necessary to understand. All right, so so far
we have learned how to create conditional
logics in Tableau and how we evaluate
it in order to manipulate our data
based on the decisions. Next we're going to
start talking about the logical operators
and or not.
144. Tableau | Logical Operators: AND, OR, NOT: Now we're going to
learn how to combine, how to evaluate
multiple conditions in Tableau using the logical
operators and or, then we can learn
about the operator. Let's go and understand
the concept, then we can practice. Let's go now. Let's start with the
and or operator. Let's have the
following scenario. Let's say that we have
one condition where we are checking whether the
sales is higher than 100. And a second condition
where we are checking whether the country is Germany. Now if you want to go and
evaluate both of them, you want to combine
those two conditions so that they work together. We can use the end or
operator in between here. We can use those
two operators to combine the condition A
with the condition B. And the output can be as well as usual epullion, true and false, our two operators or there are logical operators that are used to combine
multiple conditions. Now let's say that
we're going to use them in FL statements. Let's see how the
syntax can look like. Let's start with
the end operator. As you can see, we have
here the F statements. Then we have our two conditions, and in between them we
have the end operator. The condition can combine both
of them in one statement. If the sales is higher than 1,000 and a country
equal to Germany, then we're going to
get the value high. If it is true, otherwise it's going to end and
we will get null. The same thing for
the ore operator. We are saying here, if
the sales is higher than 1,000 or the country
equal to Germany, then we're going to
get the value high. So as you can see,
it's really simple. Let's check an example
in order to understand what are the differences
between and Re. So now we have in our table four customers with their sales informations
and the countries. So the first condition
going to check whether the sales is
higher than one K. So now let's check the
first customers we're going to get through
because the sales is higher than 1,000 and the last two going to
be false because it is below 1,000 So this is the information
from the first condition. Then the second
condition that we have, we're going to check whether the country equal to Germany. So the first customer is from Germany, that's
why it's true. The second one is not,
we have it false. Then the next one is Germany true and the last one is false. So now, as you can see, we are
evaluating the table first in order to get the result
for each single condition. But now what we can do
is we can go and combine those two conditions to
generate new results. So now if you go and
use the end operator, it can return true only if both conditions are true
and false otherwise. So now let's go and combine those two conditions together
using the end operator. Let's check the first
customer we have the condition is true, condition P is true as well. So we are fulfilling the requirement to get it
through for the first customer, we're going to get the output true for the next
customer, Maria. We have in the condition A true, but in the condition B falls so it does not fulfill
the requirement, both of them should be
true to get it through, that's why it's
going to be false. For the next one, Martin,
going to be the same. So the condition A is false, B is true, both of
them should be true. That's why we're going to get
false the last one anyway. Both of them are false, so
we're going to get false. As you can see, the end
operator is very restrictive. Both of the conditions should be true in order to get true. Otherwise, immediately
you will get false. This is how the end operator works. Let's go to the next one. We have the operator, or operator can return true if at least one
condition is true. Otherwise, it's
going to be false. That means we need at least one true to get through
in the output. Let's go and check
the example again. For the first customer, we are fulfilling the requirement.
We have more than one. Both of them are
true. That's why in the output we will get true. The next one we have
true at the condition A. False at condition B. At least we have one, so we are fulfilling
the requirements. It's going to be true as well, the third one the same, so we have at least one
true and the condition B. That's why for Martin,
we're going to get it true. But for the last customer, George, both of them are false. We need at least one
true to get true, that's why the output
is going to be false. As you can see, the operator is less restrictive
than the ends. We need at least one true
to get true at the output. This is how the end and
O operator works in Tableau in order to combine
multiple conditions. One more thing to
notice here as well is that if you are using end and O, we are evaluating the end
result of the condition. We are not evaluating
the table itself. We are evaluating those
results that we got from the. We're going to talk
about the third operator, the nut operator. So let's take an example. We're going to have
the following table. And we have our condition where the sales is higher
than 1,000 So we will not use the nut operator to combine two
conditions together, like with the end or operator. But this time we're
going to reverse the results of the condition. The nut operator is a
reverse logical operator. It's going to return true if the result of the
condition is false. And it's going to return false
if the condition is true. If you tell it to go right,
it's going to go left. If you tell it to go
left, it going go right. So it's going to do
exactly the opposite. So let's see what's
going to happen if we say not this condition. If you use the nut operator
for the first customer, you will get false because
the value is true. The same for the second
customer, you will get false. But for the next two customers, you will get true
because the output of this condition is false, as
you can see, as the result. We're going to flip the truth. We're going to get exactly
the opposite if you use, so it's going to look like this in the calculation in Tableau. Here again we have
our F statement, our condition, but just
before the condition, we're going to go and put nuts. And with that, you are
reversing everything. Now what you are saying
here in this condition, if the sales is not
higher than the 1,000 then we're going
to get the value low. So that means anything equal to 1,000 or smaller than 1,000
it's going to be low. We are reversing the results. That's it, this is how
the nut operator works. Now let's go back to Tableau and practice those three operators. All right, so now we're going to go to our big data source. Let's grab the information of
the customers to the view. So we're going to
get the customer ID, the first name, country,
and the scores as well. But I would like to
show the discrete values of the scores. Let's switch it to discrete. And then we need a measure. Let's go to the orders
and get the sales, put it on the caums,
as you can see. Now we have for each customer, the total sales
that they ordered. Now the task is to not show all the sales
of all customers. We want to focus on specific
group of customers. Now we want to show the
sales for only customers that come from Germany and
their score is higher than 50. With that, we have two
conditions and we can go and use the end or operator
in order to combine them. As usual, we're going
to go and create our new calculated field, and we're going
to call it sales. We're going to start
with the F statements. Now we need to write
our conditions. So the first
condition, the country should be equal to Germany. The country field, we
have it over here, must be equal to Germany. Now, since we are seeing end in the task is going
to be here as well. And in order to connect
condition the second condition, the score should
be higher than 50, the field score should
be higher than 50. Now we have our two conditions. Both of them are connected
with the ant operator. Now, if both of them are
true, what can happen? We can show the value sales. Next, we're going
to say then sales, and otherwise it's going
to be null that sets. We're going to go and end
the statements that we can see that the calculation is
valid, everything is fine. So let's go and try
what can happen. Let's go and click
okay. Now we have our new field in the
data on the left side, it's going to be
continuous measure because the output
going to be sales. Now we're going to go
and check the values. But first I would like to get
rid of those par diagrams. I'm just going to move the
sales to the details and then move it again to the
view over here at the APC. So now we have those values. Let's get our new sales with the end operator and put
it as well on the view. Just let's make it a little bit bigger to see the headers. All right, so now let's go
and check out customers. Let's take the
customer number two, you can see the country
equal to Germany, so we have the first true
and the score as well, higher than the 50. So we have another true. With that, we're going to
get at the output to true. That's why we are
seeing the value of the sales at the output. Let's move to the next one. We have the customer
number three. You can see the country is not Germany, so we
have here France. So the first condition
going to be false. Immediately, the
output going to be false because both of
them should be true. But we can check
the second value, you can see the score as well, not higher than 50.
Both of them fails. And the output can
be failed as well. That's why we are getting Et, we are not getting the sales. All right, now let's move to
another customer, number 23. You can see the customers
comes from Germany. The first condition
is fulfilled. We have our first true, but the score is
not higher than 50. The second condition failed. That's why we didn't
get any results. As you can see, the end
operator is very restrictive. Everything should
be true in order to get the results. That's it. This is how the end operator works. Let's move to the next. We want to show
the sales only for the customers that they
come from Germany, or the score is higher than 50. The logic is very simple, right? But here we have to change
the operator on how we are combining those two conditions. We're going to have
the same thing. That's why I'm going
to go to the sales and let's duplicate it, and then we go and edit it. We're going to change
the name to Or, and we have the same conditions if the country
equals to Germany, but this time or the
score is higher than 50, that's why I'm going
to go over here and let's change it to Or operator. Now I would like to
mention something that those logical functions are very close to the
English language. If you just read this code, it's like you are saying
a sentence in English. So what you are doing here is if the country is
equal to Germany, or the score is higher than 50, then show the sales. That's it. You see it's like translating the English sentence to a code. And it's really easy to
write and to read as well, so it's really logical. Now let's pack our calculation. You can see it is valid. Let's go and hit Ok. And immediately we can see
in the view that with We are getting more values than the end because the end
is very restrictive. Now let's go and
check some customers. You can see the
first one we have, the country not equal to
Germany, come from France. The first condition failed, so let's have hope
for the next one. But the score is higher than 50, that means this customer going to fulfill
the requirement. It's enough to have
only one true. That's why we have the sales and the output the next
customer fulfill. Both of the conditions come
from Germany, higher than 50. That's why we have the sales
like the end operator. But the third customer,
as you can see, the first condition failed
because France and the second as well failed because the score is not higher than 50. That's why both of them are failed and we don't
have any results. We have to have at least to
get something at the outputs. So that's it, this is
how the operator works. All right, now we have the
following task for you, is to show the sales for only customers who either
come from Germany or France. You can bounce the video now in order to
complete the task, and once you're done,
you can resume it. Okay, so let's see
how we can do that. We can go and create a
new calculated field. We can call it Sales Country. And we're going to start
with the statements. Then we have the two conditions. The customer should be either
from Germany or France. The first one going
to be the country equal to Germany and the operator going to be or the customer could be either
from Germany or France, country equal to France. What can happen if one of those
conditions are fulfilled? We're going to have
the sales, then sales, and that's it. Let's end it. As you can see, very simple. Let's go and hit, okay. As usual, we're going to
go and check the values. Let's drag and drop it over here in the view, We have
it here in the middle. Let's just make it a little bit bigger and see the customers. Now we are checking
only one field, but in two conditions. Either the country,
France or Germany. The first customer we can
see come from France. We're going to get the value. The second one as well, we're going to get the sales value. France, USA. We will not get any value because it's not part
of the condition. As you can see now we
are getting the sales of all customers come either
from France or Germany. Okay, now I'm going to show
you quickly something. Let's go back to our
calculated field, sales country, and
go and edit it. Now instead of having or we're going to use the operator now, what we are saying
is the customer should come from Germany, and at the same
time from France. It sounds weird, right? So let's go and
try it. Let's hit okay, and check the results. You can see that the sales
country is completely empty, So we don't see any values, because in our situation, the customer should only
come from only one country. We cannot have this
condition logically. From the data perspective, this is not possible. All right guys, what do we
have learned at the end? Let's move next to
the nut operator. Okay, so now we have
the following task. Show the sales of all customers who don't come from Germany. If the customer come from
any other countries, we're going to see the
sales and the view. But if the customer from
Germany, it should be null. All right, so now let's go and create a new calculated field. We're going to call
it Sales Germany. And we're going to have
as well the F statements. So now we have two
ways to do it. The first option
and the long one, where we're going to go
and create a condition for each value
inside the country. Beside Germany, we're
going to do something like this country equal to USA. And then we're going to say Or country equals, for
example, Italy. And then for the next one, or country equal France. As you can see, I'm
creating a condition for each value from
that dimension country. Of course, if you have a
long list of countries, you're going to end
up making a lot of conditions as well. What can happen if a new country enters inside your data
source? What can happen? You can always go
to the calculation and add it as a condition. In this option, we are including all the values that we
want to see in the view, But there is a better
way to do that where we're going to
exclude only Germany. Let's go and remove
everything from here. We're going to say if the
country equal to Germany, and this time before
the condition. We're going to add
the operator here. We're going to go and
reverse everything. If the customers don't come from Germany, what can happen? We're going to show the sales, then sales, and that's it. As you can see, it's
very short and simple. We are just excluding
one values. We don't have to
add all the values. We don't have to be
worried about if there is like a new country value
inside the data source. Anything not Germany, we're
going to show the sales. Let's go and check the values. I'm going to go and hit okay. Now as usual we're going to get a new calculated field
in our data source. Let's drag contribute to the
view to check the values. Just make the head a little
bit bigger to read it. Then scroll up and the first
customers come from France. We're going to get the
sale informations. The next one from
Germany we have now here we have as
well the customer, five from Germany, six
as well from Germany. We don't have any
sales informations. So we can see that all the
customers that don't come from Germany had the sales
in this field as well. We can check that by sorting
the countries and it's sorted like this and all
those values from France, we're going to get always
sales informations. And if we go to Germany, you see all the customers
from Germany don't have any sales informations
in this field. They say we're going to
get, again, the values. As you can see, it's
really easy to use and really useful to make
filters and so on. And as well to focus on
specific group of customers. In our views, that's it's
about the three operators. They are really nice to use. All right everyone. That's all
for the logical operators. And with that, we have covered all eight logical
functions in Tableau. They are really important
functions since it's going to help us to make data driven
decisions in the analysis. And with that, we have
covered the last group of functions under the category
row level calculations. We learned around 40
Tableau functions. And next we're going
to learn about the aggregate
calculations in Tableau.
145. Tableau | Aggregate Functions: SUM, AVG; COUNT, COUNTD, MAX, MIN: All right, so now
we're going to talk about the second type of calculations that
we have in Tableau, the aggregate calculations. And I split the functions
into two groups. The first group
going to aggregate the measures in our data source, so we have the sum,
average count, and so on. And the second
group, where we can aggregate the dimensions
of our data source. And here we have
only one function. We have the attributes. So now we're going to
focus on the first group, how to aggregate the
measures in Tableau. All right, so the
first question is, what are aggregate
calculations in Tableau? If you use those calculations, you're going to aggregate
the rows of the data source, put the result at the visualization
level of the details. That means the dimension
that you are using in the view going to control the
granularity of the measure. Let's have a quick example.
In order to understand it, let's say that we
have the order table inside our data source. We would like to find the
total sales by the products. In this example, the sales is a measure and the product
is the dimension. In order to find
the total sales, we can use the function Sum
in Tableau. Look like this. We can use the sum of
sales in the view. We can have one
dimension, the products. It is the one going to control the level of details
in the view. And then we have the result
of the function sum. We're going to put here the
results of the aggregations. Now with this table
going to go and group up the rows of the orders
by the products. As you can see, the first group is based on the
product number one. Then we have the
second group for the product number
23.4 As you can see, the orders now is
divided into groups. At the visualization levels, we're going to have exactly
only one row for each group. That means for the product
one we can have only one row. And then table going to go and summarize all the sales
inside this group. At the end of the result, we can have the value of 40. As you can see, the
aggregate calculations is grouping up the rows
from the data source and presented as one row at the output in the visualizations going to move to the next group. For the two, we can
have only one row and the summarization of
the sales going to be 50. And the same thing going to
happen for the product three, we're going to have
here two rows and the summarization of
that is going to be 45. And as well for the P four, we have as well one row in visualizations with only
15 as a total sales. As you can see, the
aggregate calculation is going to go and
group up the rows of the data source and present it as one value in the
visualizations. And the level of
detail is going to depend on the dimension
that is used in the view. That's why we say that
aggregate calculations going to bring the data at the
visualization level of details. And it's not like
the functions in the row level calculations
where we have computed each value
on the same row. So we anything the number of rows going to stay
exactly like before. So this is how the aggregate
calculations works. And we don't have
only one function. We have here multiple functions. So the first one we have the
sum that we just learned. It can return the total sum
of all values within a field. And then we have another
one, the average. It's going to return
the average of all values. Then
we have the count. It's going to count the number
of values within a field. Then we have another
very similar function called count D. This time we're going to
count the number of unique rows within a field. Then we have the max and min. It can return the maximum value or the minimum value
within a field. Now if you check the syntax
of those aggregate functions, it's going to be the easiest. If you compare it to
any other functions, they all follow
the same pattern, so they always start with
the name of the functions. For example, the sum,
average, count, and so on. And they all accept
only one field. So as you can see, we
have the sum of sales, average of sales, and so on. So we have only one argument,
and it's very simple. So now let's go in
Tableau and start practicing those
aggregate functions. Okay, so back to our
small data source. Let's go to the products, and as usual we're going to get the category and as
well the product name. Now those two dimensions are
going to define the level of details and the product name going to be the one
that is controlling. So here we have the five
products inside our data source. Now, in order to create aggregated calculations
in Tableau, there are two ways.
You're going to do it. Locally, directly
only for this view, or globally by creating
a new calculated field, and it's going to be available
for all other worksheets. So now let's go and check the first methods where
we're going to go and create a quick
aggregated calculation. We're going to go to the orders and we're going to
take the sales. Just drag and drop
it here on the view. Now as you might already
noticed that Tableau always try to aggregate the
data at the visualizations, and for that, Tableau going to use the aggregated functions. So as you can see,
we have the sales, but before it we have
the sum of sales. That means Tableau is using the function sum in order to
aggregate data in the view. And this is the default methods from Tableau to
aggregate the data. That means in Tableau,
the default type of calculations can be used. On the measure is the
aggregate calculations. And the default function
that's going to be always be used is the sum. Now in order to
change the function that is used in
the aggregations, we can go to the measure over
here, right click on it. And here we see that
our field is a measure. And using the sum function
in order to change that, let's go to the measure and
we can find here a list of all different aggregate functions that we
have in Tableau. We have the sum, the
average, the count, count, distinct, minimum,
maximum, and so on. Now, for example, we can go over here and change it
to the average. Now instead of sum of sales, we have average of sales. And add the output we
can get the averages. As you can see,
it's very simple. With just one click, we change
the aggregation function. And as well, it
doesn't need a lot of configurations like we're going to see later in the table, calculations for example,
or the LOD expressions. So this one is really easy. If you want to
change the function, just go to the measure
radically on it. And then here you have a list of all functions that
you can configure. And of course, anything that I'm choosing now from
those functions will not affect any other sheets and will not affect
our data source. Here we still have the sales. We don't have any field
called the average sales, so it can be only locally available for this
visualization. That brings us to
the second method where we can create an
aggregated function that is globally available for all other worksheets or workbook connected
to the data source. All right, so now
let's say that I would like to have an
extra field inside my data source to find
the total of sales. In order to do that,
we're going to go and create a new calculated fields. It's really simple. We're
going to call it Total Sales. Then in order to see the
aggregate functions in Tableau, we can check the
documentations over here. Let's go to All. And then let's choose Aggregate. And with that, you can find all the aggregate
functions in Tableau. Inside it, you can find as well the LOD expressions
we have here, the fix include and so on. Find the total sales. We're going to have
the function sum and as you can see it
need one expression. It's going to be the sales. It's going to be only one field. We're going to have the sales. And that's it. As you can see, the calculation is valued. Let's go and hit, okay. And with that we got a
new continuous measure inside our data source. But here, the difference between aggregated calculations and
the row level calculations, those calculations is going
to happen on the fly, where the row level
calculation is going to store the data
inside the data source. That means if you go and check the data source data or if
you view the data from here, you can see that we don't have any information about
the total sales. Now if you browse the data, we don't have any extra
field called total sales. Because those
informations will not be recalculated from Tableau and stored inside the data source. It can happen on the fly as you bring the field to
the visualization. That means Tableau will not
go immediately and execute the aggregate calculations
as you are creating them and then put the
result in the data source. Tableau will do it on the fly. That's because
Tableau doesn't know the level of details that you
need at the visualizations. As you know, the data source
has the level of details. That's why only one
type of calculation, the row level calculations, can be pre executed and stored
inside the data source, and the rest can
stay on the fly. That means our new
calculated field using the aggregate functions will not store inside the data
source any data. The data going to be calculated. Once you drag and drop
it inside the view, it's going to stay empty as
long as you don't use it. Let's go and close
this over here. And let's drag and drop it to the view to check the results. Now in this view, we
got the total sales pi the products because
the product name going to control the
level of details. Let's say that you
would like to have the total sales by the category. In this view, you have to
remove the product name. In order to do that,
we're going to go and remove the product
name from the view. And with that we got the total
sales for each category. That means the aggregate
calculations or the granulity of the measures is
going to depend on the level of details
of the visualizations. The dimension can
control everything. Going to control the level of details that we see in the view. So now let's go
and understand how Tableau brought those
numbers to the view. Okay, so in the data
source we have 15 orders. And in the visualizations
we said, okay, we would like to have the
category Tableau going to go and get the category
to the visualizations. And inside there there
are like two values. So we're going to get the
accessories and the monitors. So we're going to have
with that only two rows. Then we can have the
sales, the total sales. Tableau going to
go and aggregate the sales for each category. So as you can see,
Tableau going to go and split the orders
into two groups. One with the
category accessories and the other one
with the monitor. Now in order to find
the total sales of the accessories table, going to go simply and go
aggregate all those values of the sales and put the
result at the output. The first one going
to have like around 2377 for the next group
table can do the same. Going to go for all those
orders underneath the category, Monitor and go and aggregate all those values
that we're going to get around 4,129 As you can see, table can go and
split the rows by the dimension that is used in the visualizations
in this example. It's going to be
by the category, it's going to split
it into two groups. And then you can go and apply
the aggregate functions. Let's move to the next
one. We would like to find the average sales
for each category. In order to do that,
we're going to go and create a new calculated fields, and we're going to
call it Average Sales. The function is very simple. It is the AVG, the average. Then we can have our field sales and that sets, it's
pretty simple. Let's go and hit
Ok. And as usual, we're going to get
a new empty field inside the data source, but once we drag and
rub it on the view, the calculation is
going to happen. Let's do that. We can find the average sales
for each category. How Tableau did the
calculations is very simple. Table going to split again the rows inside the
others into two groups. The first group for
the accessories, so it's going to go and. All those values
inside the sales. And then it's going
to be divided by the total number of orders
inside this category. Here we have around
eight orders. The final value going
to be around 297. The same thing
going to happen for the second group table, going to go and add
up all those values, then divide by seven because we have only
seven orders for the monitor and we will
get 590 as a result. We can see again
that that dimension category is deciding how the calculation can happen and as well how the data
going to be split up. That's all for the
average function. Let's move to the next one. We have the count. Let's say that we would like to find
the orders for each category. In order to do
that, we can go and create again new
calculated field, and we're going to call
it number of orders. The function is really simple, so we're going to
use the counts, and inside it we
need only one field. This time we're going to go
and count the order IDs. In order to do that, we use
the order ID and that's it. We are counting how
many orders IDs we have inside our data source. The calculation is
valid, let's go and hit. Okay. As usual,
we're going to get a continuous measure
in our data source. Let's go and drop it to the
view and check the results. We can see that in the
accessories we got eight orders, and in the monitor
we got seven orders. Now let's see how Table is
doing that. It's very simple. Again, our data is splitted into and Tableau going to start
simply counting the rows. So how many rows do we have
inside the accessories? It's going to be eight rows. We have here eight orders. And if you count the
rows of the monitor, you will get as
well seven orders. With the count function, we are just simply counting the rows. So that means in the
accessories we got eight rows, and on the monitor
we got seven orders. There is one more special
thing about the count, Let's say that's inside
our data, we got nulls. Let's say that we don't
have any order ID. It's empty, it's null.
So what can happen here? Tableau will not count it. So in this example, Tableau
going to go and count only six instead of seven,
we're going to get six. And this as well going to
affect the previous function, the average as we learn before. It's going to go and add up
all those values and then it can be divided by
the number of orders. So let's say that we have
here a null this time. Tableau will not
divide it by seven. Tableau going to go
and divide it by six. And here again, a reminder that we have to handle
the nulls inside our data as we
learn before using the z end or Nal
ifnal and so on. So if we divide it on six, it can be different
than dividing it by seven which is more correct, sorry, we have seven orders. Are six orders, that's means pay attention if you feel that you are doing the aggregates
on top of it, whether it has nulls or not. Because having a null here, we're going to get
inaccurate results. We don't have six orders, we have seven orders
inside the monitor. All right, so that's all for
this function, the count. All right, so now
we're going to move to a very similar function in
Tableau called the count D. It's going to return
the number of unique or distinct
values within a field. It sounds very similar
to the counts, but here we have a
difference between them, where we are counting
only the distinct values. Let's have an example in order to understand the difference. We would like now
to show the number of products, each category. Let's go and create a
new calculated field. Let's call it
number of products. This time I'm going to
start first with the function counts to show you
the differences between them. And we're going to use
the field product ID. Let's go and select that. And then get, okay again, we got a new calculated field. Let's show it at the results. And we can see that
the results is very similar to the number
of orders here. Again, we have
eight products for the accessories and seven
products for the monitor. Now what happened here? Well, if you check the
data inside the order, we got only two products with the accessories and as well only two products
for the monitor. Why we got Ta and Civil. And that's because
Tableau going to go and count the number of rows, whether it's like duplicates
or not, it doesn't matter. So Tableau going
to go and count. Okay, here we have eight rows, that means we have
eight products. So that's why we cannot use the count function
for this task. We have to use another
thing where we're going to use the count D. Let's
go and change it. I'm going to go to the
calculated fields. It just add a D after the count
to use the next function. So we have count product ID. Let's go and hit Okay. And as you can see
in the result, now we got two for the accessories and
two for the monitor. So let's see how Tableu
going to work here. Tableau can count the distinct or unique values
within the field. This time Tableu going to pay attention to the
content of the field, so it's going to start counting. Okay, here we have
the USP mouse. This is one. Then the next one we have the same information. Tableau will not
count it at all. The same for the third, then for the fourth order, we have a new product. So here we have a new value,
the logitic keyboard. So here we have two, then
move on to the same stuff. So here we have the same values. Tableau will not count
them at the end. Tableau did count here
two unique values. Here we have two products
for the accessories, that's why Tableu going to go
on the output and put two. The next category, so
we start to the same, We have the LG full HD monitor. This is one product, the
second one is the same. Value will not count it, then move to the third one. As you can see,
it's new products, new value. So it's
going to count two. And the rest will not
count anything because it as well Duplicates
table going to go and count the number of unique
values within the field. That's why we're
going to have as well here two which is more accurate. We got only two products for the accessories and only two
products for the monitor. This is the difference
between count and count D. Count will just
blindly go and count, how many roles do we have
inside each category? But count D going to go
and check the content, and it's going to count only the unique and the distinct values. All right, so now we're going
to move to the last two. We have the max and min. They are very simple
functions in Tableau. The max can find the
highest value within a field and the men can find the lowest value within a field. Let's go and check
how it can work. So let's say that we
would like to show the highest sales
for each category. In order to do that,
we're going to go and create a new
calculated field. Let's call it Highest Sales. And then we can use the max function and we have the sales. It's very simple, it always
needs one field that set. Let's hit okay and let's
check the results. Let's put it on the
view so we can see the highest sales
inside the accessories is the 525 and the highest sales for
the monitor is the 1691. So let's see how this works. As usual, our data is
split it into two groups. We start with the first group, so table going to go and
check all those values. What is the highest values
inside those sales? It's going to be the 525 table going to present it as a result. Then we're going to move
to the second group. So table going to take all
those values and compare it to each other's in order
to find the highest value. And it's going to be
this order number two as the highest sales inside
our data for the category. Monitor that. This is how the max function
work in Tableau. Let's go to the next one to find the lowest sales
for each category. We're going to do
the same stuff. We're going to have a new
calculated field, lowest sales. This time we can use the
function and then our field Sales that sets click Ok. Let's present it as a
result as well to compare it. So we can find the lowest sales
in the accessories is 56. And the lowest as well
for the monitor is 40. The same thing, Tableau.
Going to go and check all those values
for the first group, what is the lowest sales? As you can see, it's
going to be this order, order number ten going
to be the lowest value. And then Tableau going to
go and check those group of values in order to
find the lowest value, it's going to be this 139. Tableau is just
surrounding the numbers, that's why we have here 40, but in reality it is
39.97 So that's it. This is how the max and
main works in Tableau. As you can see, the
aggregate functions in Tableau are very simple. Those functions like
I think this is my easiest tutorial that I
made in the Tableau series. All right guys, so that's
all for these six functions in order to aggregate the
measures of our data source. Next we're going to talk
about how to aggregate the dimensions using the very confusing
function, the attribute.
146. Tableau | ATTR Attribute Function: We're going to
talk about another aggregate function in Tableau. But this time this
function is going to be very special and
it is very confusing. A lot of people get confused about the attribute
function in Tableau first. As usual, we can
understand the concept behind it and then we
can practice in Tableau. Previously, we have learned that the aggregate
function is going to go and aggregate the numbers, the measures inside
our data source. This makes sense, right? To have the total sales in the view. But now how about to aggregate the values
of the dimensions, for example, the customers
or the products? How to aggregate those values? We cannot go and use
the sum function in order to aggregate
the dimensions. We can go and use the
attribute function, the attribute
function in Tableau, going to go and
aggregate the values of the dimensions of
the data source and present the
result in the view. But this time I would
like to go and aggregate the values of the
customers by the products. In order to do that, we can
use the function attribute. For the customers in the view, we can have two values. First we have the
dimension product. This one is going to define the level of details
of this view. Here we have another
field where we can have the result of aggregating
the customers, the attribute of the customer. Here we have two options. The first one, if
all values same, then it's going to return a
single value, the same value. Or if we have multiple values, then it's going to return risk. This might sound very confusing or complex, but don't
worry about it. Let's just follow the
example again here, since we are grouping up
the data by the products Tableau going to go and group up the orders by the products. The first group for the
product number one, the second group
for two and so on. In the visualizations, we're
going to have only one row for each group like any
other aggregate functions. Now for the first group,
we're going to have one row, the pay one and Tableau
going to go and check the values inside the
customers for this group. As you can see, we have the same informations
in those three rows. We have John, John, John. We have the same value, so we are at the first options. If all values are the same, then it can return
a single value. That's why table going return. In the output, John with that tablet did implement
the first option. Let's go to the next group. So the two as you can see in the customers and the two we
have here different values. So the first one is John, the second one is Maria. Maria, we don't have
the same values rights. We have different values. That's why Tablet
going to go and execute the second
option because we have multiple values and
table going to return risk. So that's why we have here
and trick other results. This is how the attribute
function works in Tableau. Let's move on to
the next products. Let's see that we have the
P three and as you can see we have here again two different
values, John and Maria. They are not the
same. That's why the second option
going to be activated. And table going to
have the asterisk. Other results for the product. Four, let's check. We
have Maria and Maria, we have the same value. That's why table going to go
and execute the first option where all the
values are same and then we're going to get the
same value in the output. That's why we have Maria. That's it for the
attribute function. It's really simple, right?
Once you have an example, then everything
going to be clear. Again, if the values
are the same, like here John, then we're
going to get the same value. And if the values are different, so you have multiple values, then table going to
have the Asterix. And now you might ask what this Asterix means in the view. Will table use it as a highlight or warning for you to
tells you there are more details in this field
inside the customers and the Asterix can
help you as well to understand the relationship
between dimensions between, for example, the customers
and the products. As you can see, for
the product two, we have multiple values, so it is like one
to relationship. But for the product one, we
have one to one relationship. So we have only one customer
for only one product. With that, you can understand the relationship
between dimensions. All right, with that, we have
understood that in Tableau, we can of course, aggregate the measures like in
the sum function. But as well, we can go and
aggregate the dimensions inside the data source using the attribute
function in Tableau. So this is the main task
that we usually use the attribute function to
aggregate the dimensions. Now let's go back to Tableau in order to practice this function. All right, so I'm going to
show you a very quick example on how to create the
attributes in Tableau. Let's stick with the
small data source. Let's go this time
to the customers. We're going to
take the countries and the cities as
well to the view. Now I would like this aggregate the dimension city
inside this view. In order to do that, we can
use the function attribute. There is two ways to do it. Either globally and
locally, as usual, locally only for this view, globally for all
other worksheets. Let's see the quick
one, the local one. In order to do that, we
go to the city over here, write a click on it,
and then you can find this option between the
dimensions and measures. This time we have
the attributes. Again, this is not
the third option of the meta data that we learned before, dimensions and measures. This is simply an
aggregate function that Tableau just put it
between those two options. It is not the third option, it is an aggregate function. Let's go and click on that. Now we can see from
the name of the field, we have the function attribute
applied on the field City. And the level of details in our visualizations is not
anymore the city like before, it is now the country, the city going to have
an aggregated value. For France, we have Paris, for Germany, and USA,
we have the risk. Let's see quickly how
Tableau did that. Okay, here it's
very special about the attribute
function in Tableau. It's not like all other
aggregate functions where we start from
the data source. Here we start from the
visualizations depends on the visualization level of details that we have
inside the view. It's going to do
the calculation. Here we have the visualizations, the country and the city. It's going to focus only
on those two dimensions. At the start, we
have France, Paris, and we have two
values for Germany and two values for USA. Since the country only
dimension that we have in the view and the city
can be an aggregation, the level of detail is
going to be the country. That means we're going
to have only three rows, only three values. Tableau going to show us as we can see here on the left side that we have France,
Germany and USA. Now as we learned, Tableu going to go and check the values. If all values are the same, we're going to get the
same value for France, we have only one value, it's going to be the same value, Tableau going to go and
put it at the output. Then the next one, Germany, we have this group of rows. We have two rows,
Berlin and Stuttgart. We have two different values. That's why Tableau
going to go and put the asterisk at the output.
The same for the USA. As you can see we have
two different values, so we have multiple
values and for that Tableau can show as well
the Astrisk at the outputs. And that's why we have
here only Paris for France and two Astrisks for
the other two countries. So you can see this
is very simple. Let's go to another example to understand the use case
of the attributes. All right everyone, So now
we might ask, okay, nice. We can now aggregate
the dimensions, but where do I use
it in my dashboards. So what are the real use case for the attribute
functions in Tableau? Well, usually I tend to use the attribute functions
in two use cases. The first one inside
the tool tip, where I want to
show for the users more details about
the aggregations. Let me show you how
I usually do it. Let's go to the big data source and then we're going to
go to the customers. Let's take, for example,
the country, the city, all informations
about the location, and as well the postal code. Then as usual, we would like to show the sales informations. So let's go to the orders and take the sales to the columns. And we're going to
show the labels and as well the color of the sales. So now we can see that
the level of details of our visualization is going to be based on the postal code. Since it's going to bring us to the lowest level of details, let's say that the requirements wants us to have the level of details of the city
and not the postal code. There is two ways to do
it. Either we can go and remove the postal code
from the view over here. With that, we got the level
of details of the city. But now let's see that
I still want to bring the postal code informations to this visual as a
details for the users. I cannot just drag and try. Put it here, it's going
to split the data, right? You can see here, Paris,
we have two values. Instead of that, we can use the attribute functions
in Tableau if we still need to present the postal code informations
in this visualization. As we learned before,
we can go over here and quickly switch
it to attribute, or we can make it
globally to re, use it in different worksheets. Let's go and choose that. We're going to go and create
a new calculated field. I'm going to call it
attributes, Postal code. The function is very easy. It's going to be
on the attribute and accept only one field. It's going to be
the postal codes. It should be a dimension. That's it, the calculation
is valid. Let's go and hit. Okay, so we've got a new calculated field,
a new dimension. Let's go and bring
it to the view. I remove the postal code. Now we can understand
quickly from the view that the postal
code and the city, they are almost at the
same level of details. As you can see, we
have always values, but only two countries
where we have the asterisk. So we have the Paris
and the Portland. With that, we understand
the relationship between the postal
code and the city. They are almost at
the same level, but sometimes we
have more details. In Paris, we have
here different values for the postal code and
as well for the Portland. Now, in order to show those
details for the users, either we can leave it as like a field over here as a
header or a better way in order to save some spaces in the visualizations and not
show a lot of headers. We can show it in the tool tip. In order to do that,
we're going to drag our field and drop
it on the details. And then we have over
here this option to configure our tool tip. Let's go inside it now. As you can see, we have
four informations, City, country sales, and
our new field, the attribute postal code. But I would like to
rename it in order to make it easier for
the users to read it, so it's going to be the
postal code information. Let's go and hit
Okay, and now Add. The users are mouse hovering
on those informations. You can see that we have
more details about the city. We have the postal code
informations inside it, and if we have multiple
values, like in Paris, we can have the Astrisk I usually explained
for the users. If you find the As risk, it means we have
more details about the aggregations which
may raise the curiosity for the users to go on more detailed analysis about the postal codes
instead of the cities. And with that, we are presenting the postal code
informations even though that's our level of details in the
visualizations is the city. This is very common use case
for the attribute where you can present more details
for the visualizations. Even if you have a very high
aggregated data at the view, and for that we use the abate
functioning in Tableau. But sometimes we end up, like
in most of the situation, that the users want to
see those informations, they want to see
those postal codes and the sales
informations for them. In order to do that,
we do the following. We go and create a new sheets, and this time we're
going to create a view where the postal code is, the level of details,
all what we need is the postal code and
as well the sales. Drag and drop the
sales to the view. Let's just make it a
little bit bigger to see the header
information. So that's it. Let's call it sales
by postal codes. This view can be now embedded
in the original view. In order to do that, we're
going to go back to our view where we have the city
as the level of details. Now we want to do
embedded worksheets inside this view,
inside the tooltip. Let's go to the tool tip over here. Let's have a new line. And then we're going
to go to this menu over here, the inserts. With the first option,
we have the sheets table going to show us all the sheets that we have in this workbook. It's going to be the last
one, sales by Postal Code. Let's go and hit on that. Now we have embedded
another worksheet inside the view using
the tool tip that sets. It's very simple. Let's
go and hit, okay. Now let's go and mouse
over on those cities. As you can see, we have
now a table or a view, small view inside the tool
tip if you go to Paris. Now we see now the
two postal codes, and this will the sales
of those postal codes. This is how I usually do it as the next step if the users
want to see more details. But of course, this needs
more calculations and more resources in Tableau to
put one view in another one. If the users are happy
with the Astrix, then stay with the attribute. But if they need more details, then you have to
create another view and then put it
inside the tool tip. All right, so that's it
for the first use case. We use the attribute to show more details for the
users if we have a high aggregations in the view and we use it usually
in the tool tip. All right, now let's move
on to the second use case, where I usually use the attribute
functions in my project is to check the data quality
inside the data sources. Usually, if you are
working with the data, you have some expectations
about the data quality. And if you have any suspicions, we can use the
attribute functions in order to investigate
the situation. For example, let's say
that the expectations in our data to have only one
country for each customers, the data should not
allow for some reason to have multiple countries
for each customers. If you are skeptical
about this information or we want to check the quality
of the data that we get, we can use the attribute
functions like this. We can go, for example, and take the customer ID. We can take the first
name, last name, but now we would like to check the quality of the country. But since we have a lot of
data inside our data source, it can be really
hard now by just checking the values to
understand whether we have multiple values for each customers or is it
one to one relationship? Instead of that, we can go and aggregate the country using
the attribute function. Let's do it this time
by the quick way. Or right click on the country, and let's apply the
attribute function. At the start, you might see,
okay, nothing is changed. But now instead of quickly
to validate the data, we can sue it as a filter. Right click on the country
over here and show filter. Now on the right side
table going to show us all the possible values that
could happen to this view. Here we have the Astersk. We have France, Germany,
Italy, and USA. Of course, what is
interesting is the first one, so I'm just going to
remove everything and select the asterisk. Now we can see as we
selected the asterisk, we don't get any data.
This is perfect. That's my, the data
quality inside our data is perfect and we have exactly one country for each customers. But if we start getting
data from the Asterix, it means we have
multiple values for each customers and we can
investigate this situation. So this is one time analysis for our data to check
the data quality. But let's say in the next
day or the next month, we got a lot of
new customers and we want always to check
those informations. We can go and make data
quality dashboards for us or for the users to check whether our expectations
is correct, only selecting the Asterix. And we can explain
that we expect that this view going
to be always empty. If this view is not empty, then we have a data
quality issue. And we can add this
information in the title. We can call it data
quality check. Then it's about the
multiple countries. This is expected to be empty. If it's empty, then
everything is fine. That's all for the
second use case for the attribute
function in Tableau. As you can see, it's
really handy for the projects rights to
understand your data, to do data quality
checks and so on. Or as well to show more details for the users inside
the tool tip. All right, so that's all for the attribute
function in Tableau. And with that, we have covered many important functions under the category aggregate
calculations. Next we can start talking about the LOD calculations in Tableau. They are really interesting
and important to understand.
147. Tableau | Introduction to LOD Expressions: All right everyone. So now
we're going to talk about the third type of
Tableau calculations. We have the LOD expressions
or LOD calculations. It is another type in order to aggregate the
data in Tableau. And here we have
only three functions we have, include and exclude. And as usual, first we have to understand the
concept behind them. Then we can have enough examples
in Tableau. So let's go. All right guys, so now
we can understand, when do we need
LOD expressions in Tableau using this
very simple example. So let's say we are building
a view where we have the category informations
and the product name. And now we are showing the
total sales for each products. Now by looking to
those two dimensions, you can understand that
the product name is controlling the level
of details in our view. So we have five products, and with that we got five rows. So the product name is splitting
the rows of this table. But now we come to the issue. If you want to show
in the same view, in the same dimensions, and set up, you want to show the total sales
for each category. Well, we cannot do that as long as we have the product
name inside this view, because the product name is splitting the view
into products. In order to show the total
sales for each category. Either you have to remove
the product name from the view by just drag
and drop it away. You can see now we got the
total sales for each category. But if you say, wait, wait, we need to have the
product information in the view, we cannot drop it. So let's go and bring
it back over here. If you need to have the product
name and you still want to have the total sales
for each category, we have to use the
LOD expressions exactly in this
situation where we need the help of LOD expressions to control the level of details
of our aggregations. Now let's go further and
understand how LOD works. Okay, now we're going
to have quick facts about the LOD calculations. First, LOD calculation
is going to go and aggregate the rows of the data source at
the dimension level that we specify inside
the calculation. That means the dimension of the visualizations will not
control the level of details. This time we're going
to have the level of details of the LOD expressions. The LOD calculations, like
the aggregate calculations Tableau going to go
to the data source in order to query
the data there, and then bring the result
to the visualizations. And the calculation
can happen on the fly. That means Tableau can execute the calculation only if you bring the field to
the visualizations. Tableau will not recalculate and store the informations
inside the data source. Again, how it works, the
visualizations can send query to the data source
and the data source can answer with their results. This is how Tableau execute
the LOD calculations. All right everyone,
we talked about the level of details
many times during the tutorials but
now let's understand what do we mean exactly
with the level of details. Let's say that we use in Tableau only the measure
without any dimensions. With that, we're going to be at the level one and we will get, for example, the
total sales if you are using the measure
Sales Tableau. Going to go and summarize
all the sales inside the data source and present it as only one row, one value. Without using any dimensions, we will get the highest level of aggregations. Let's
go to the next level. Let's say that we use a
dimension like the category. In our small data source,
we have only two values. Tableau can split this one
value into two values. Here we can see more
details about our sales. It's not only one value, now we have it as two values. So that means this
dimension going to split our view into two rows. Moving on to the third level, let's say that you use the country inside
the data source. We have three countries, That means we are going
to have three rows. We have more details
now about the sales. So as you can see,
the sales going to split into three rows. So that means the
level of details of the category is different
from the country. In the category,
we have two rows. In the country, we
can have three rows. Moving on to the last level. If you bring the order ID
to the visualizations, you will get the highest
level of details. It is exactly the
level of details that we have inside
the data source. We don't have in our
data model any dimension that's going to break this
rows to more details. So we are now at the bottom, at the highest level of details. And we can have exactly 15 rows, because we have 15 orders. So that means each of those
dimensions going to go and break the visualizations into different levels of details. The category going to break
it into two country three, product name four order ID, going to break it into 15 rows. That means the level of
details is the highest at the order ID and
it's going to be the lowest if you don't
use any dimensions. The opposite if you're talking
about the aggregations. The highest level
of aggregations, if you don't use any dimensions. And you're going to
get the lowest level of aggregations if you're going to use a dimension like the
order ID that we understood, each dimensions brings us to a different level of details. This is, what do we mean with the level of details in Tableau? All right guys, now
we're going to go and understand the LOD
functions in Tableau. But first we can split those three functions
into two categories. The first one is going
to be the static. Where we have only one
function, it is the fixed. The second one we have
the dynamic calculations. And here we have the two
functions include and exclude. If you want to have a fixed
or static calculation, you can use fixed. But if you need more dynamic, then you have to use include
and exclude the dimensions. Inside our visualizations or in the LOD expressions define the level of details and each dimension has
different level of details. For example, the category
has only two values. That means the level of details here is very low compared
to the order ID, where we have the highest
level of details. Let's say that our
current level of details inside the
view is the country. So we have the level three. We can use the LOD
expressions order to bring the calculations to
a lower level of details. And we can use the exclude or the fixed function
to bring it, for example, to the level
two at the category. But now, in order to
present the calculations in the current view,
what can happen? The values can be
duplicated or uplicated, like we have seen in
the last use case, where we have the tables and we duplicated or replicated
all the values. Or we can use the LOD
expressions to bring us to a higher level of details like using the include or fixed. But now, if we want to bring back the calculations
to the current view, we have to do aggregations
like we have done the average number of
customers for each category. Since the customers has a higher level of details
than the category, you have to pay attention
to the dimensions that you are using inside
the LOD calculations. If it's going to bring
the aggregations to a higher level of details, then you have to focus on the aggregate functions
that you are using in order to bring the result to the current level of
details in the view. So that means we have always to aggregate data in order to go back to a lower level of details or to a higher
level of aggregations. Always here, we have to
use aggregate functions in order to come back to the
current level of details. But if we are on
above, it's easy. It's going to just
duplicate replicated. All right guys, I
hope that was clear. This is one of the most complicated concepts
that we have in Tableau, if you compare to
all other concepts. All right guys, now we're
going to go and understand the syntax of the
LOD expressions. They start with
the function name, so either it's going to be the
fixed, include or exclude. After that we have
the double points. Then we have to define
the aggregations. It's like the
aggregate calculations something like sum of sales, average of sales,
maximum and so on. But the most usual
aggregation that we use here is the
sum of something. Let's have a few examples. We
can go with the following. Like we say fixed, then we don't specify
any dimensions, then we specify the aggregations we have in this example,
the Sum of Sales. Now think about the
LOD expressions as you are building
and view in Tableau. You always have to specify the dimensions and measures
of the aggregations. Here we are telling
Tableau to do the sum of sales without considering
any dimensions. Now let's go and add dimensions
inside the calculation. Like for example,
the category here. Again the same analogy. It's like you are
building view from the dimension category and
the aggregation sum of sales. Of course, you can go and add more dimensions like the
category and the product name. The same analogy, we have two dimensions in
the view category, product name, and then we
have the sum of sales. Now, of course,
we can go and add more dimensions
like the category, product name, the same analogy. We are adding two dimensions to the view category
and the product name. And the aggregation
is the sum of sales. And of course, we can go and use another functions like
the include or exclude in those examples or
another aggregations like the average of
sales and so on. So as you can see, building an LOD expression
is very similar. As you are building any view, you have always to define the dimensions and as will the aggregations
from the measures. So that's all about the syntax
of the LOD expressions.
148. Tableau | FIXED LOD Expression: All right, so there are two
types of level of details. Lod, the first one
is the one that we define inside
our visualizations. We call it LOD viz, and the other one that we
define inside the calculations, we call it LOD expressions. Now let's say that inside
the visualizations, we have two dimensions, category and country.
And we have the sales. Now on the right
side in the LOD, if you go and use
the fixed function, let's say that we have the
fixed category, Sum of sales. What we have done
here is exactly like you are building
any other view. You need always a dimension. And as aggregation with that Tableau going to go
and let's say internally going to create a
hidden view with the dimension category and
the aggregation sum of sales. Here, since we say it
is a fixed function, Tableau will ignore
the dimension that we have on the view, so it can work completely independent from the dimensions that is presented in the view. That means the calculation
is going to be very stating and doesn't matter
what you're going to do in the visualizations. Nothing going to change
in the calculation of the LOD expression.
What do I really mean? Let's say that in the view, you have added a new dimension, let's say the
product now you have made a change in
the visualizations. We have now three dimensions, product category and country. But the LO D expression
will not change at all. It's going to get exactly
the same results it can, has the category
and aggregation. Sales. So this is the main
purpose of the fixed function, to make it independent from the dimensions that we
have inside the view. So everything going
to be static. And this is exactly the
main difference between this function and the other
two, include and exclude. So as you can see, building the LOD expressions,
it's very easy, It's very similar as you are building visualizations
in Tableau, as you are dragging the
dimensions and aggregations here. Instead, you have to define
it inside the calculation. And always you have to define the dimensions and aggregations. So it's really simple.
Once you understand it, let's move to the next
one, to the exclude. All right everyone, now back to our view where we have
the product name. In the visualizations,
we cannot use the aggregate
calculations in order to show the total
sales pi category. In order to solve this,
we're going to use the LOD expressions using
the fixed function. Let's go and create a
new calculated field. We will call it
sales pi category. Now we're going to use
the fixed function. So let's start tipping fixed and use this
suggestion from here. Now next we have to
define the dimension. Since we say sales Pi category, then we need the category. Let's add the dimension
category and then double point and the aggregation
can be the sum of sales. At the end, we have
to close the packets. As you can see,
it's very simple. We have to define
the dimension and as well the aggregation that we need in the visualizations. Let's go and hit Ok. But as usual we will get
a new calculated field on the measure and it's going to be calculated
on the flies. That means table
will not go now and store the results
in the data source. Let's go and take the results, drag and drop it to
the view over here. Now we see in the results, we have the sales
by the category. We are ignoring the
dimension product name. And it is based completely
on the Dimension category. I usually work with the LOD expressions in order
to understand it. I always imagine that
Tableau is creating a separate view in order to
calculate the LD expressions. Then add it current view. So let me show you
what I mean by that. Let's go and open again
our calculated field. And on the right
side we have over here the data source
information sense table. Going to go and
query those data. We are saying fixed category, so that means we can grab
the dimension category. And inside there are two values. We have the accessories
and the monitor. Next we have the Sum of Sales. This is the aggregation table, going to grab the sales and
start doing the aggregation. So it's going to go and
summarize all those values. For the first sections
for the accessories, we will get the total
sales of the accessories. And then Tableau going to go and summarize all the sales
for the second category. And with that, we will
get the total sales by monitor the output
of our calculation. The LOD expression can
look something like this. As you can see, the
level of details in the LOD expression is completely
different than the view. Here we have only two rows, and in the view we
have five rows. The next step table,
going to go and merge those results to the view. We have the first three products belongs to the
category accessories. That's why we are
seeing the values, the total sales from the
accessory in the view. And then the next two products belongs to the category Monitor. That's why we are seeing the
total sales by the monitor. This is how I usually
do it in order to understand expressions if
things get complicated. Now one more thing about
the fixed calculations. We say that it is
static. It is fixed. So it doesn't matter what
I'm presenting in the view, we will always get
the same results and nothing changed in
the LOD expression. What I mean with that, let's
go and change a few stuff. Let's take the
product name away. You can see we still
get the same values. Let's go and add, for example, the country to the view. Let's go to the delegations
and just add the countries. As you can see,
nothing to change. The LD expression can have exactly the same values
and it is static. All right guys, that's how the
fixed LOD expression works in Tableau. All right. The following case. I
would like to create a histogram to measure
the customer's loyalty. That means I would like to
have the data distributions of the number of customers distributed by the
number of orders. I would like to understand
here what are the number of orders that the majority of
my customers are ordering. That means I would
like to understand the behavior of my customers. That means in order to
build such a thing, we need two measures, The number of customers
and the number of orders. Well, before we have learned
how to build histograms, but only from one measure. If you have two measures, this time we have to go and
create LOD expressions. So now let's do it
step by step in order to learn how to
build such a visual. All right guys, so first let's understand the
data that we have. Let's show the number of
orders for each customers. So let's go to the customers. Over here we are at
the big data source. Then let's take, for
example, the customer ID. With that, we can have a list of all customers inside
the data source. And then let's go to the orders and grab the order counts. With that, we got the count
of orders for each customers. Now let's go and sort the
data so we can see we have only one customers with the
highest number of orders, 29. Then we have three customers that ordered the same amount. We have 2083 times
three customers ordered the same amount. Then we have one customer
that ordered 26. Then we have over
here, five customers that ordered the same amount. We have 25 orders,
those five customers. Now since we have two measures, the number of orders and
the number of customers, we have to turn one of
them to a dimension. So I'm going to be working now
with the number of orders. To turn it to a dimension,
we want those values, the 292-82-6205, In
order to do that, we can go and create
an LOD expressions using the fixed function. Let's go and create a
new calculated field. We can a number of
orders per customer. We're going to go and build
something very similar to this view using
the LD expressions. We can start with
a fixed function, then our dimension going to be the customer ID
like in the view. And then our aggregation going
to be the count of orders. You can go with the
distinct if you are not sure whether there are
duplicated inside the orders. But I'll stick with the accounts and then we can
have the order ID. And then let's go and close it. With that the
calculation is valid, we just build exactly like
this view. Let's go and it. Okay. Now with that we've got our new field over here,
the number of orders. Let's go and check the results. It's going to be
exactly the same data that we have inside our view, but this time we have
an LOD expression where we have more
control in this measure. Now we're going to drop
everything from the view. We just need the new
calculated fields. And now let's go
and switch it to dimension in order to
have distinct values. Then move it to discrete. So with that, we've
got something very similar to the bends right here. We have a distinct values
from the number of orders. Now what is missing
is, of course here, the number of customers in
order to have histogram. So let's go to the customers counts over here and just
drop it on the rose. With that we've got
exactly what we want, the data distributions of
the number of customers. So as you can see over
here, for example, we have three customers
that's ordered four times. And here again, we
have only one customer that ordered 29 times, if you remember the example. And then we have here
those three customers that ordered 28 times. So that you can
understand quickly the behavior of the customers
by just checking the view, we can understand that
most of our customers are ordering 11-16 which
is really good. Like we don't have
a lot of customers that are ordering only once. The left side over here is really low,
which is very good. And of course, now we are
summarizing all the data that we have inside the data
source at the five years. And now you might
have the question, does the behavior of the
customer change over the time? In order to answer
this question, you have to bring the time. So we have to bring
the order date, let's drag and drop it
to the roads over here. And now we can see very
quickly that the behavior of the customers are not
changing over the time. So as you can see, the histograms
looks identical, right? So most of the
customers are ordering 11-15 and that's over the years, and we cannot do such analysis without the LOD expressions. So you can see the power of LOD.
149. Tableau | EXCLUDE LOD Expression: In the visualizations,
we're going to have exactly the same view
with the two dimensions, category and country. But now in the LOD expressions
we're going to use the where we're going to
exclude category, sum of Sales. Now what we are telling
Tableau is to go and exclude the dimension category
from the visualization. That means in the LOD
expression on the right side, we're going to get all
the dimensions from the visualizations and we will
exclude now the category. We're going to remove the
category from the dimensions, that means on the
LOD expression. Now in this example, we have the country that's going
to control the level of details in the LOD expressions and Tableau going to do again, depending on this dimension, that means the exclude
function will always remove the dimensions that is
specified in the calculation. Here the big difference
between the exclude and the fixed exclude is depending on the dimensions
that we have in the view. Let's say that we have added in the view another dimension. So now we have product
category and country. What can happen to
the LOD expressions? Tabla. Going to take
all those dimensions and will only exclude
the category. That means the
calculation now going to depend only on the
product and the country. You can see it is very dynamic and it depends on
the visualizations. The exclude will always react to the dimensions that are
specified in the visualizations. Going to remove the
dimensions that we specify in the calculation. Moving on to the
second LD function that we have, the exclude. Let's say that I
would like to have the total sales inside the view, but I would like to ignore
the dimension category. In order to do that, we can use, let's go and create a
new calculated field. Let's call it sales
exclude category. We start with the function
excludes, let's select that. Then we're going
to have to specify the dimension that
should be excluded. It's going to be the
category after that, as usual, we have to define
the aggregate calculation. It's going to be
the sum of sales. Let's close the packets.
So it's really simple. We are telling Tableau
to ignore always the category from
the calculations, so everything is valid. Let's go and hit
Okay. And as usual, we will get our new calculated
field in the data brain. Let's go and trip it on the view in order to check the results. If you check the new results, you can see we got
different numbers from the sales by category
or the original sales. What is going on over here? Now, since we are using the
exclude function in Tableau, the LOD calculation
is going to be depending on the
dimensions of the view. Let's open again our
calculated field, and let's see what
Tableau going to do. Tableau going to depend on the dimensions that we
have inside the view. We will have in the
LOD calculations, the country and the category. But since we are
here saying okay, go exclude go and remove
the category table, can remove the
dimension category, and with that we are left only with the dimension country. Since we here have
like dublicates, we have only three
countries at the end. In the LLD expressions
we will have three rows. Now what table do
going to go and find the total sales
for each country? The data source is
going to be split into three groups
for each country. One we have France,
Germany, and USA. That means tab going
to go, for example, for France and go and summarize all the sales for
those three orders and put the results at the output then goes for the
same as well for Germany. And take all those sales, summarize it and get as well. And the results
sales for Germany. And then we have for the
USA, those four orders. And we're going to go and
summarize the sales for that so that the output of the expression is
going to look like this. We have the country and the
total sales of countries. Now if you compare
to the view to the results that we
have, as you can see, as we exclude the category, we're going to have the total
sales for each country. Here, France, we have 172 and as well for the second
category, we have France. We will get exactly
the same total sales. And the same thing going
to happen for Germany. So we will have exactly the same values in both categories. For Germany, we'll get
this value as well. For the monitoring in Germany,
we will get this value. As you can see,
once you understand what is going on
in the background, you will understand the in the view as we say that
the exclude is dynamic. It is not like the fixed. We will not get
always those results. It's really going to
depend on the views, on the dimensions that
we have in the view. Let's take, for example, let's add another dimension
to the view. Let's go and get the customers. Let's go to the customers. Take the first name,
let's drop it over here. Now if you look
closely to the data, you can see those numbers, nothing changed inside
it because it's always fixed to the
category dimension, but they exclude this time
they have different numbers. If you go and compare what
we have at the start, the total sales for
countries, those numbers, you don't find it anymore
in the sales over here. And that's because we have
added a new dimensions. We don't have the country. We have as well, the first
name of the customers. So that means now we have in the LOD expressions
two dimensions, the country and the
first name. The result. The output of the LOD
expression can look like this. We have two dimensions, country and the first name. We don't have the
category, we exclude it. We remove it from the view. And then we have the total sales for this combination
of dimensions. The total sales for
George from France, total sales for Maria
from Germany, and so on. Those numbers are exactly the same that you're
seeing in the view. As you can see, the exclude
function is dynamic and depends on the dimensions that are presented inside the view. This is how it works. Now
let's move to the next one. We have the include.
150. Tableau | INCLUDE LOD Expression: All right, so now let's move
to the include function. It is exactly the
opposite of exclude. So we're going to have the same example in
the visualizations. We have the two dimensions,
category and country. And now we're going
to say to Tableau include customer dimension. And we're going to have
the same aggregation, the sum of sales. Now what we are telling table
with this calculation is to add one more dimensions
to the visualizations. To add dimension customers to the two other dimensions that we have inside
the visualizations. Here again it's very
dynamic tablo going to take the dimensions that are
presented in the visualizations, the category and the country, and add to it in new dimension. The customers the function include is very similar
to the exclude. It is dynamic. It
is depending on the dimensions that we have
inside the visualizations. Again, the same example, if we go and add one more
dimension the products, we will end up having
three dimensions in the visualizations and table
in the LOD expressions. Going to add one
more dimensions to the expression where
we're going to have at the end four dimensions, customers, product
category, and country. So that means in
include function, we are saying do
the aggregations in all dimensions
that we have inside the visualizations plus
one more dimension that comes from the calculation. So it's really easy, right? So now to summarize, the fixed
function is very static. It doesn't care
about the dimensions that we have inside
the visualizations. It is completely independent. So it's going to stay the same as you are changing
the visualizations. But they exclude and include. They are depending on
the visualizations. So exclude going
to go and remove one dimensions from the
dimensions that are presented in the visualizations were include going to go and add plus one more dimension to the dimensions that are
presented in the visualizations. So we have now understanding how those three functions
works in Tableau. So now we're going to go
back to Tableau in order to practice those three
functions. So let's go. All right, so now we need more attention about this function. To include, it is more difficult than to
exclude and fixed, so let's have some coffee. Let's go. All right, so
as we learned before that each dimension has
different level of details. For example, the first name has more details than the
country or the category. So now comes the issue. If you want to
remove such details from the visualizations, you want to remove
the customer's names. And you want to stick only with the category and the country. But still, you want to introduce an aggregation that has
to do with the customers, with a dimension that
has a lot of details. For example, we want to bring here an aggregation that shows the average sales of customers for each
country and category. But without showing the customer's informations
as a dimension, let's go and remove the
first name from here. We don't have here any
customers information. But still we want to
bring the aggregations to the customer's level by calculating the average
sales of customers. In this case, if your
aggregation is based on a dimensions with a high level of details like the
customers or the order ID, then you have to use
the function include. So let's see how we
can do that. Let's go and create a new
calculated field. And we can call it Average
Sales of customers. We can use the function include. So let's select the include. Now we have to say to Tableau which dimension can be
included in the view. So currently we have the
category at the country, we would like to add
the first name or you can add the customer
ID, doesn't matter. Let's add the first name. And then we have to
add the aggregation. This time we're going to
use the sum of sales. Now you might ask, why
do we have the sum of sales We are talking
about the average. Well, the average is going to be the second aggregation that we're going to do it on top
of this LOD expression. First, we have to summarize the values that we have
inside the data source, and then we can do the
average on top of it. We're going to do it step by
step, don't worry about it. Then we have to close
the brackets like this. As you can see, now
the calculation is valid. Let's go and hit okay. With that, as usual we get
a new calculated field. Let's drag and drop
it to the view. We still are not
there because here we have the average
sales of customers, but the function that is
used in Tableau is the sum. We have to go and switch it to the average function.
Let's go and do that. With that, we got
the average sales of customers for each
category and country. Now we're going to
see, step by step, how Tableau did the
execution of the include. The include going to depend on the dimensions of the
view we have here, the category and the country. That means Tableau can start
up something like this. We category and the country. The next step, Tableu going to go and check the LOD function. Let's go and open it again. We are telling Tableau
now go and include the first name to the dimensions that are displayed in the view. Tableau going to go and grab those informations,
the first name, and presented in the output we will have three dimensions, first name, category,
and country. We can have something like this. Now if you compare
the number of rows of the LOD expressions
with the view, you can see that we have
now more details in the LOD expressions since
we added the first name. Here we have round eight rows, but in the view
we have six rows. The level of details of the LD expressions is
higher than the view table. Going to go to the
next step and say, okay, we have to have
the sum of sales. We can have the sales
as well over here. And Tablo going to go start
aggregating the rows. For example, first we have
George Accessories are France. It's going to be only
this row over here. We don't have it anywhere else, so we're going to have the 91. Then we have Maria
Accessories, Germany. For that, we have three rows. Table going to go and
aggregate those three rows. In the outputs we will get
something like this and so on. So tab going to go
and start summarizing those values based on
those three dimensions. And at the end we will get in the outputs
something like this. That table calculated
the sum of sales by including the first name to the dimensions that
are presented, Z. Here we come to the
issue where we have in the LOD expressions more
details than the view. In order to bring those
results to the view, we have to aggregate it again. We have to either summarize it or do the average and so on. So we cannot bring
those details over here without doing
any aggregations. In this example, we want
to find the average of customers for each
category and country. That's why we have used
the average function. That means if you are using
the include function or you have more details
in the LOD expressions, we have to aggregate the data in order to bring it
to the visualization. But on the other hand,
if you are using exclude or fixed
and the output of the LOD expression
has lower level of details than the view,
then what can happen? We're going to have double kits. For example, you
can see over here, sales by category,
we have doubled. So it doesn't matter which
function we're going to use, summarize or average, we will
get always that doublates. The same thing for the exclude. We had lower level in details in the expressions
compared to the view. That's why you can
see duplicates. We have the same
numbers over here. Those three rows, they are like repeated over here for
the second category, this is the effect of
the LOD expressions. If the level of details in the expression is higher
than the visualizations, then we have to
aggregate the data. But if the level of details in the LOD expressions is lower than the view,
then what can happen? We can get back to our example Tableau
going to go and find the average
of those values. So the first value
is going to stay the same because we have
it only as one row, so it's going to stay the same. But now for those two
rows, as you can see, Germany Accessories Tableau
going to go and find the average of those table
values, we will get 954. And then for the next row,
we have Accessories USA. In the output we
have only one row. That's why the average going
to be exactly the same. The same goes for
Monitor France. The same value,
but the next value we have Monitor Germany. Here we have two values. Table can go and
find the average of those two values and
we will get 433. And for the last one
we got only one value. That's why we got
exactly the same number. Yeah, as you can see, if you get more details as a result
from the LOD expressions, things get more complicated
and you have to be careful which aggregations you are using in the visualizations. All right, So that we
have learned how table can execute those three
functions step by step. Now next we're going
to go and learn real use cases of
those functions. All right everyone.
Now in this use case, we want to compare the sales of all categories to the sales
of a specific category. Like here selected one the tables in order
to understand how the sales of the
other categories are doing to this specific category. In order to build such a view, we have to use the power
of LOD expressions. This time we can
use the exclude. Let's learn step by step
how to create such a view. All right, let's start with the first step where
we want to show the sales by subcategory.
This is the easiest one. Let's go and grab the
subcategory to the rows. And let's take the
sales to the columns. And then we're going
to go and sort the sales. Let's go and do that. Now our task is to go and
find the differences between each subcategory with a specific subcategory
of the tables. For example, we're
going to go and find the difference between the sales of phones and
the sales of tables. That means in order to
find the differences in each row, we
need two measures. The first measure
are going to be the sales of the
current category, like for example, the
sales of the phone. The second measure, we need
the sales of the tables. Here we need the sales of
the tables to be as well. At the same row, the first measure, we have
it already, right? We have here the sales
for each category. But the second one,
we don't have it yet. We need to have for each row, the sales of the tables. In order to do that,
we're going to go and create a new calculated field. To have these tasks, let's go and create a new
calculated field. Let's call it Sales of Tables. What do you want to check now
is whether the subcategory, the current one is tables. If yes, then show the sales. We're going to use
if statements, then we want to check
the subcategory. If it equals to tables, you should write it exactly like the data that we have
inside the data source. What can happen? We want to
show the sales, do nothing. We want to have nulls. The
subcategory is not tables. What we are doing now is isolating the sales of
the subcategory tables. Let's go and it okay, and let's go and bring it
to the view over here. As you can see, we have isolated the sales of the tables
in this in new measure. But we still have the
problem that we would like to repeat this
value for each row. As you can see, we
have it only if the subcategory
equals to tables. Now, in order to repeat this
value for all the rows, here comes the trick or the
magic of the LOD expression. As you learned
before, the exclude going to go and repeat
the values, right? We can go and use this trick. What we can tell Tableau
is that imagine that in this view there,
what can happen? This measure is going to
be repeated for all rows. Let's go and do that. Let's go and create a new
calculated fields. We can call it
exclude subcategory. Now we have to use the
listed calculations because if you put everything
in one calculation, it's going to be
really complicated. Now we want to tell Tableau, imagine that we don't
have subcategory, in our view, exclude subcategory and the aggregation
going to be the sum. But this time of the new measure that we created for the tables, some sales of tables. And then we have to close
it, something like this. We are telling Tableau
exclude the subcategory from the view and do
the aggregations. Let's see what can happen. Ok, and drag and drop
to the view over here. As you can see, since
we have only one value, we are ignoring completely
the subcategory. We will get the same value
repeated for each rose. So now we have all, what do we need to find the
differences, right? We have the sales
of each categories. And the sales of specific
category, the tables. So now we're going to
move to the last step, where it going to be
the easiest part, where we want to
find the differences between those two measures. So we're going to go
and subtract them. Let's go and create a
new calculated field. Let's call it difference. We can subtract the first value. It's going to be simply
the sum of sales. This is going to be the first value that we have over here. Then with our new measure, it's going to be the sum of our exclude function,
exclude subcategory. And that's it. Let's
go and hit okay. And let's drop it to the
view that we solve the task. We have the differences
between the sales of each category and the sales
of specific category. The tables, of course, you can see the table is
going to be zero over here, because we are subtracting the sum of sales with
the exactly same sales. It is a little bit tricky, but if you understand how
the LOD expression works, you can really do such analysis. Now let's go and drop
everything from here. We don't need those sub steps, I'm just going to
remove them now. Of course, we can add
the coloring over here. Let's go to the measure
on the right side. Let's take the measure to
the colors, and with that, we can see nicely
the differences between the subcategories
and the tables. Now if you'd like to
highlight the tables, since it's our main category, where we're comparing
all the others to it, we can make the use of
the Sales of Tables. Let's switch to this
measure over here, to the Sum of Sales
and the Marks. And then let's take the Sales of Tables and put it on the colors, and with that, you are
highlighting the main subcategory. With that, we have made
really complicated analysis using the LOD expressions.
151. Tableau | Table Calculations: FIRST, LAST, INDEX, RANK: Everyone, So now we're going to talk about the last type of calculations that
we have in Tableau, the table calculations. And here we have
different functions, like the running window, rank first, last index, lock up. We're going to talk about
all those functions in this tutorial as usual. First we can
understand the concept behind the table calculations. Then we're going to
go back to Tableau in order to start
practicing. Let's go. The first question is, what
are table calculations? Well, there are calculations that are going to be executed or performed after the aggregation is done on the visualizations. So they're going
to like aggregate the aggregations in Tableau. And it's important to understand
the level of details. It can be depending on
the visualizations. That means here again,
the dimensions in the view can control
the level of details. Now to the big
difference between the table calculations
and the others. The calculations
can be performed on the data that we
see in the view. Tableau will not go to the
data source, equate the data. Tableau can equate the data that is presented in the view. That means the view can be
quaring the view itself. It's going to send equery to the data inside the
visualizations. And the view going to return the result pack to
the view itself. We are not going back
to the data source, everything going to be
quared inside the view. The other three types of calculations like the
aggregate calculations, LOD and roll level calculations. Always going to
query the data from the data source and bring
the result to the view. Only this type of calculation going to query the
data in the view. All right guys, in order to
create table calculations, we have to define two things. First, the scope. Second, we have to define the directions. The scope means which data can be included in one calculation. For example, we have
the following view. It looked like a table, right? So we have rows and we
have multiple columns. But here we can see that our
data is splitted by groups. Each group can be defined
by the dimension quarter, so we have the 123.4 Now the first option that
we have is the whole table. That means the calculation can include everything
inside the table. It will ignore any partitions that we have inside the table. It's going to start from
the first value and it's going to end up
by the last value, moving on to the next scope
or to the next option. We have the pain this time, the calculation going to
focus on a smaller scope. This time we're going to
focus on the partition or the group of data which is
defined by the quarter. That means the table
calculation is going to be done for each
group separately. We can have for those
three rows calculations. Then we can move to
the second group, to the third group, and so on. Moving on to the last
scope, we have the cell, it's going to be only one
value inside the view, the scope going
to be very small, including only one
individual value. Here we have to
define for Tableau, the scope calculations. Is it going to be the whole
table or only the pain, Only the group of data, or only one cell? All right, the next thing
that Tableau needs from us is the direction
of the calculations. How the calculation is going
to move through our table. So here we have four
different options. The first one going to be down. That means we're
going to start from the top value and we're going to move down until
we reach the bottom. That's of course going
to depend on the scope, whether we are running
the whole table or only a group of values
like we have in the pain. In this example, we
have the table down. That means we are processing all the values in one
calculations from top to bottom. Then it's going to reset and
move to the second column. And we can do the same
thing for the next year. That means this time
the calculations is moving through the
columns in one go, it starts from the first year and it ends up with
the next year. Then it's going to reset and start for the next
raw and so on. We are moving from
left to right. Those two methods
are the basics. Either you can move
down or you can move right the next
two directions, it's going to be mixing
those two methods, the first one going to
be down, then across. That means first
we have to go down through the table and then
we have to go across, it's going to start
from the top first, then go to the bottom. But this time it will not reset and move to
the next column. Continue doing the aggregations, it's going to go to
the right across, then it's going to move
again from top to bottom. There, across, top to bottom until we reach
the last value. That means here we
don't have any resets, it's going to continue the calculations
through all values. It's not like the first
two methods where we have resets for each row over
here or for each column. This time the starting
value going to be the top left and the last value
going to be the batum right. Moving on to the last
direction that we have, I think you got it already. It's exactly the opposite. First we do across, then we're going
to do down here. Again, there is no resets. We're going to start with
the first value on the top left and then we go
to the right first. Then we jump to the next row, then we go to the right. We jump down right until we reach the last value
on the patom, right. So that means the calculation
first is going to move right and then it's going to
jump down to the next row. All right, So as you can
see, it's not that hard. Once you get it, we have four
different directions and three different scopes
that Tableau needs from us in order to create
table calculations. All right guys, in Tableau, we have different
methods on how to create table calculations depend
on the difficulty. The first methods that we have is the quick table calculations. As the name says, it's very
quick and easy to create. Here we have a list of
different table calculations. You don't have to
configure anything, you just have to click
on the function that you need and table
can do the rest. Here we have a very
common table calculations like the running total, the difference, rank,
moving, average, and so on. The second methods, it's
going to be not that quick. We have to configure
a few stuff. But still we are not writing any functions or
any calculations. Still we are clicking around. But here we have
more options and more control to configure
the table calculations. If you compare to the first one, the first one is just selecting the function, and that's it. Here again, we have
very similar functions. We have the rank running,
total moving calculations. We can define different
options like the scope, which dimensions can control the table calculations,
and so on. Moving on to the last methods on how to create
table calculations. We can do it by creating a
new calculated field and then use the functions that are used for the
table calculations. Here we have a list
of many functions that you can use in order
to do table calculations, but they are a little bit
harder if you compare to the first two methods in order to create
table calculations. As you can see,
as you are moving from left to right,
things get harder. But with that, you are getting the full control and
the full options. Next, we will go back to Tableau in order to try
those three methods. And we're going to try
a few functions that we have inside the
table, calculations. All right guys, so
back to Tableau. Let's go to the big data source. Let's go to the products
and get the usual stuff. So we're going to get
the category subcategory and the sales as usual
to the sales over here. So I'm going to show you
the different methods on how to create
table calculations. And we're going to
start to the first one. We have the quick
table calculations, which is the easiest one. In order to do that, we're
going to do it on the view, so it's going to be only locally
available for this view. It's not like creating
a new calculated field. So we're going to
go to our measure over here, right click on it. And then here we
have two options. The first one says add
table calculations and the second one going to
be quick table calculations. The first one is the middle one that I showed you previously in the presentation where you have to configure
different stuff. But the second one is the
easiest one and the quickest one where we can create table calculations
with only one click. Now let's go and check the
quick table calculations. If you go over here,
you will find a list of different table calculations. And we can go over
here and let's check, for example, they
are running Total. Click on that here, there's two things
to be noticed. First, the numbers here
changed because here we have different aggregation
functions as well. We have here a new icon, and the measure table wants
us to quickly identify whether the measure is using aggregate calculations
or table calculations. If you see the triangle, that means this measure is
using table calculations. As you can see, with
only one click, we have created
table calculations. Here we have running total. Don't worry about
it, I'm going to explain it step by stepulator. Well now you might
say, you know what, We didn't define anything. The scope of the directions
for the calculation. So how we can do that, if
you go back to our measure, to the table calculations, riticlculate and you can find, now we have more options once we converted to
table calculations. And exactly here,
the computing using. We have those options here we
can define the scope table, paying, sale, and as well
the directions as well. You can see that we have
different options like clear table calculations
if you want to remove it back to the
aggregate calculations. Once you do that,
you can see we got back our sum of sales
without the icon. Well, that means we are not using anymore the
table calculations. Using now the aggregated
calculations. So that's all for
the first methods, how to quickly create table
calculations in Tableau. But we don't have a lot
of options to configure. That's why we have the
second methods where we have more options to control
the table calculations. But again, we're going to create it locally only for this view. It will not be available
for the data source. All right, so before I
show you how to do that, we're going to get one more
dimension to our view. So let's get the years
of the order date. And I would like to
have only three years, so I'm going to show
it as a filter. I'm just going to remove
the first two years in order to have fewer
data in the view. Now in order to create table calculations
only for this view, with more options we can go back to our measure
the Sum of sales. Currently it is an
aggregate calculation, but we want to convert
it to table calculation, so radically connect
and this time we're going to move to add
table calculations. For the first option,
you can see we have this small icon indicate that
this is table calculation. So click on that and we will get a new window here to configure
our table calculations. So what do we have here? The
first thing that we have to define is the type
of calculations. We have here a menu of different functions for
the table calculations. Again, here they're
running total, the rank differences and so on. So let's stick with
the first one, the differences from here. We have to define for Tableau two things, the scope and the. They are always together, They are not splitted as options. The first one going
to be Table across. Tableau here did
really great job by highlighting how the
calculation going to work. As you can see Tableau here, highlighting with
the yellow color how the calculation is
going to be performed. Just to help you to understand how it's going to work,
It's really great. We have the table across
from left to right, then we have the table
down from top to bottom. Then we have the option
off across the down. As you can see, it's
going to affect the whole table since we move from the top left
to the bottom right. Then we can define
the other scope. Like for example, the
pane down as you can see. Now the scope is smaller
compared to the table down. Now the table down in. Everything in this column, but the paint down can
include only this group. As you can see, our
view is split into three groups based
on the category. We have the first
group over here, the second and the third, and Tableau is highlighting
the first group. It is like a partition. Another option, we have the cell where Tableau can highlight only one value or we can define specific dimension to
do the calculations. Here we have a list of all dimensions that we
have inside the view. And you can go and select
what the scope going to be, whether it's going to
be the subcategory or the year of order dates. Then each function that we
have has more specifications. For example here, what are the values that are relevant
for this calculation? Again, don't worry about it. I'm going to explain
how the difference work as well in Tableau, you have to define
whether it's previous, next, first, and so on. Each function in Tableau
has different options. For example, if you
go to the rank, you will find over
here we don't have now those previous,
next, and so on. But instead we have different options to configure the rank. Each Tableau calculation
function here has different set of
options to be configured. All right, that's
all for this method. As you can see, we
got more options compared to the first one. Let's go and close this. Let's say that we are
interested to have this calculation for all other worksheets,
we want to reuse it. In order to do that, we're
going to go to our measure and just drag and drop
it on the data pain. And with that, we got a
new calculated field. This time we are using
the rank of sales. I can go and rename
it Try And Sales. And with that, we got a
new field on our data being and we can reuse it
in different worksheets. All right, Sana, we can
move to the last methods in how to create table
calculations in Tableau. We're going to go and create a new calculated field
and use functions. So let's go and do that. We will start with
the function index, So let's create a new
calculated field. We can call it index. And the syntax is very simple, so start with the
index and that sets. We don't need to specify
anything for this function. So you can see the
calculation is valid. Let's click okay. And with that, we got a new measure,
new calculated field. Let's go and check the results. So I'm just going to drag
and drop it under view. So what this function
does is it's going to return position number
of the current value. That means the first position
in this view going to be the first row as we are
moving from top to bottom, this going to be the
position number one, position number 234,
and so on until we get the last value
as the last position. Now you might notice that we are calculating all the
rows in the table. We are using the
scope of the table. We can check that
if we go over here to our measure
erratically connect. And we can see that the compute
using is the table down. Let's say that we would like to have an index for each group, not for the whole table. Let's go and switch
it to the pane down. Now as you can see
the calculation on the pain, not
the whole table. For the first group,
we have the first row, the pocas, then the second
third force and so on. Then it go and reset
for the second group. On the second group going to be this row going to be
the number one and the last position
or the index in this group going to be the
supplies and not the last one. The fonts, As you can see, it always reset for
each group because we have specified the
scope only for the pain. Now if you go and
switch it to the cell, let's go and do that
computing using cell. You can see that each cell
going to be the first value, the position number for
each row going to be one. This is how it works
with the scoping table. All right, now let's
go and switch it back to a table computing using. As you can see,
it's very simple. Let's go and try another
function in Tableau. We're going to use this time, the first function,
so let's carry o, a new calculated field. We're going to call it first. And the function is
going to be as well. Really easy. It's going to
be first and that's it. It's like the index.
You don't have to specify anything inside
the calculation. The calculation is valued.
Let's go and hit okay. And check the result
as well in the view, let's try and drop it over here. And now we can see
that Tableau assigning the first row with
the value of zero. And as we are moving
down with the values, as you can see the
numbers are decreasing. Those numbers are going to
be, How many steps do we have until we reach again
the top, to the zero? Here, for example, we need three steps until we
reach the first row. And as well here we have -11 until we reach the top value. Here we have like
a distance between each row and the
first row in Tableau. There is another function where it does exactly
the opposite. It's going to be the last.
So let's go and try it. Let's go and create a
new calculated field. It's going to be the last
function, not in this tutorial. Be last as well. It doesn't need any
fields inside it, so that's all the
calculation is valued. Let's go and hit Ok. Let's drag and drop it on
the view over here. So now we can see that it has exactly the opposite
effect of the first. So Tableau going
to go and assign the last value in our
view with the zero, and as you are
moving to the top, the values can increase. Here again we have the distance, or how many steps do we have until we reach the last values? Okay guys, we have one
more function that is very similar to
the last first index, where it going to gives us the position number of the rows. We have the rank function. Let's go and create a
new calculated field. We're going to call it ranks. Starts with the keyword rank. And as you can see, we have
five different functions and how to rank the data. We're going to start
with the easiest one, the first one,
let's select rank. And here we can specify
two things for Tableau. The first one can
be the expression or the aggregate functions. In this view, we have
the sum of Sales. So let's go and define
that Sum of Sales. And the second information that Tableau needs it as an optional. It's going to be how to sort
it, ascending or descending. If you leave it empty,
Tableau going to use it as a default, the
descending methods, so let's stay with the defaults, that's all the calculation is valid. Let's go and hit Okay. And with that we got a
new calculated field. Let's drag and drop it to the
view to check the results. So now we can see that
Tableau goes and ranks all the subcategories
based on the sum of sales. So we can see over here
that the phones has the highest sales
and we have it as a rank one and then the
second highest sales, we have it over here as
a two for the chairs. All right guys. So
now if you look at those four functions
and the results, you can see that they are very similar to each other, right? They're going to define
the position number of the rows using
different methods. Now you might ask, what are the use cases of
those four functions? Well, generally, there
are two use cases. First, we can use it as a
filter ind visualizations, and second we can use it in another calculations
for the force use case. For example, let's
go and pick the rank and show it as the
filters to the users. They go and specify,
for example, the top five subcategories
in the visual. You already know that there are different methods
and how to show the top product or the top sub categories
indivisualizations. This is one method
in how to do that. Or we might be in a
situation where we have a very big visualizations,
a lot of rows. I would like to
show for the users only the first five rows. Without any specifications
or ranking or anything, we can just go and show
the first five rows. In order to do
that, we go to the first and show it as a filters. Let's go and reset the rank. We can go over here and define. Okay, I would like to see the first five rows
or the opposite, we want to show the
last five rows, so we can go to the last
and show it as a filter. Let's go and reset the first. So now we can go over
here and say, okay, I would like to see the last
five rows inside my view. So this is the
first use case for these very simple table
calculations functions. We can use them as a filter. All right guys, moving on to the second use case
for these functions. I usually use them in another calculations to
generate a reference line. Let's have a quick example. Let's go and create
a new worksheets. We're going to take
the order date to the columns and as well
the sales to the rows. And this time we're going
to have the months as well. So let's change it
from year to month. And I would like to have
it as a part diagram. As usual, I want
to show the labels and as well the colors
from the measure. The task now is to
show a reference line based from the first
value in the diagram. We have the first value
of 21,000 I would like to have it as a reference in order to compare the
other Manss with it. We can do that using
the function first, but we have to add it in
another calculations. Now, in order to make it
simpler to see how this works, I'm just going to
go and duplicate this view in order to
make it like a table. Let's go to the
Show me over here. And switch it to a table. And then I'm going to take
the mans to the rows. Now we have a very nice table. I would like now to
have the first value as a new calculated field. Okay, I would like
as well to add to this view the values
from the first function. Let's go and get
the field that we already created and
drop it on the view. You can see the first row
in this table going to be January 2018. So we have the value of zero. And I would like to show now
the sales only for this row. I'm not interested
with the other rows. Only for the first row, we have to show the sales. In order to do that,
we have to go and create a new calculated field. Let's call it First Sales. And the logic can be like this. We can check if first
function equal to zero. If we are at the first row, as you can see, we
have hit a zero value. What can happen? We want to show the sales it's going to be then we can have the field sales. Otherwise we don't want
to show the sales. That means we can go and end
the statements with that. As you can see, if the position number is going to be zero like the first one, then show the sales. Otherwise don't show anything.
Let's go and hit, Okay. And with that, as usual,
we got our new measure. Let's drag and drop it
to the view over here. As you can see, tablet can show the sales only if the
first equals to zero. If not, as you can see, we
don't have anything with that. We got the first value
in the seals and now we can go and use
it as a reference line. In order to do that,
we're going to go back to our original sheets and let's go and add our new
calculated field to the details. Then let's go to the axis
to the seals, radically add reference line. The value can be based on our
new calculated field, so let's go and switch it
to the fares of sales. And we can go as well and change the label from
computations to custom. And we can say, okay, this
is the first that sets. Let's go and hit, okay.
Now as you can see, we got our new reference line. And the value of this
reference line can be based always from
the first value. As you can see, it's going
to be 21,000 So we can go now and compare the
other values to our reference line as well. This can be very dynamic. That means, for
example, let's go and add a filter to our view. Let's go to the
order date and show the filter now what can happen
if we deselect the 2018, the first value going to be from January 2019. Here
we're going to get 47,000 as a reference line. With that, we can understand the power of table calculations. They are based on
the visualizations, not based on the data source. Anything you change individual, the table calculation
going to react to it, which makes it very dynamic. This is another use case
for those four functions. First last, index,
rank, and so on. For example, you can go and say, let's make the reference line based from the last
value on the table, so you can go and switch it. That's it for those
four functions.
152. Tableau | Table Calculations: RUNNING TOTAL: Guys, now we're
going to talk about very important and very common table
calculation in Tableau. It is the running total. The running total is
going to go and sum all the values as they
progress over the time. For example, in this view we can track the performance
of our business, where we can go and compare the three different
categories of our products. Where we can see
here the development or the progress of customers, and as well the orders
in order to quickly understand whether our business
is growing or declining. Now if you compare, in this
view, those three categories, you can see that the
office supplies is growing very fast if you
compare to the two others. So you can see using
the running total in our view help us to
understand progress, the performance of our business. So now let's go and understand how this function
works in Tableau. Okay guys, so how the running
total calculation works. It's going to go
and add each value to the sum of all
previous values. Let's have an example on
others understandards. We have over here the months
and the sales as well. And we want to build
the running sum. We start with the first value, so we are currently
at the first row, and since we don't have any
previous sum of values, it's going to be
exactly the same value. The calculation going to be the current running total going to equal to the sales value. That means in the
output, we're going to get exactly the same value, 2607 on to the next
month to the February. So currently we are at this
level at the sales 523, and the previous running total going to be the old
one from January. Now in order to get the
running total for February, it's going to be simply
adding those two values. So we are adding the sales value plus the previous total run. And with that we will get
2,590 So as you can see, we are simply adding
the current sales with the previous running value. Let's move to the next month. We have a new
current, we have the 6,422 And we're going to add it again year to the
previous running total. So we have again
the same formula. With that, we are going to
get 9,013 As you can see, we are just adding the
current sales with the previous running total
from the previous month. We can proceed and progress our table until we
reach the last one. It's going to be
exactly the same. We are currently at December, and this is our current value. We're going to go and add it to the previous running total
from the previous month, November, until we're going
to get the last value. And with that we have the
final value for the total run. As you can see, we
build like progress or development of the
sales over the Monsls. This is how the calculation
of the running total works. Let's go back to Tableau in
order to learn how to create it and build the visualization
using the running total. Let's start with the
big data source and let's go to the bad acts here, we're going to get our
category to the rows, and then we need the date. So we're going to
get the order dates from the table orders and
put it on the columns. We need it as a continuous
month, Right click on it. And then let's switch it
to this option over here. Now we need the
measures because we are tracking the
progress of customers. We want the count of customers. We're going to go to
the customers over here and let's
grab this measure, customers count, and
put it in the view. And now we're going
to go and change the visual from line to bar. So we're going to
go to the Marks over here and change it to bar. Now we have here the total number customers for each month. We still don't have
the running total. In order to do that,
it's very simple. We can go and use the
quick table calculations. It is the easiest one, right click on the
customers over here. And then let's add quick
table calculations. And simply here,
the running total. Let's go there. So now
we can see that tablet converted to running
totals for each category. And we can see immediately
that the progress of customers in the office
supplies is the best. As you can see,
it's very simple. What we are missing now is the count of orders,
The number of orders. So let's go and get
our second measure. It's going to be
the orders count. And let's grab it and put it near the customers over here. But as you can see, both of the measures
are very similar. So we have to change the
visual for the orders in order to understand the differences between
the two measures. How to do that? If you go
to the marks over here, you can see we have
three sections. The first one is all that means. Anything that I'm going
to configure over here, it's going to affect everything,
both of the measures. But now, since we want to change the visual only for the orders, we're going to switch
the marks to the orders. So let's click on
that in this tab now, I'm configuring
the running total of the orders instead of bar. I would like to
have it as a line. If you go to the
colors over here, we can add this dotted line in order to see like the
differences between the muscles. And I can reduce as with
the opesity in this line. All right, so now the next
step we're going to go and change the colors because
both of them are blue. Let's go to all, and let's grab from the left side,
measure names. Let's go and put it over
here on the colors. The next thing that
we can do is to merge those two axes for each
category into one. I would like to
have only one axis. In order to do that, let's go to the orders
right to click on it. And here we have an
option called dual axis. What it's going to
do is going to merge those two axes into one.
Let's go and click on it. Now as you can see, we've got only one axis
for each category. We don't have any more of
the split between two axes, so now we have it
only on one view. Now we can see that we've got only one axis for each category. We don't have any more
of the split between the two measures,
everything in one. Now we can see that the axes are on the left
and on the right. The next step, what
we usually do is, but not always, is to go
and synchronize those axes. Right click on it and we have here the option
synchronized axis. Thus, both of the axes
are at the same level. We can go now and hide the
right one because it is useless to have the
same information twice on the left
and on the right. I will go and hide the
header from the right side. And maybe we can
go and get rid of those information that
we have on the axis. So go and edit the
Ax and we can go and remove the title, the set. It's close. I'm just minimizing the information that we
have inside one view. That's it. As you
can see now we can track the progress of
the customers and orders by the category
using the function that is very commonly
use, the running total.
153. Tableau | Table Calculations: DIFFERENCES: All right everyone, so
we're going to talk about the last table
calculation function. We have the difference. The difference is very simple. It can find the difference
between two data points. And there are many use
cases for this function, but the most famous one
is compare two things. For example, to compare
period to period. A very common one is to compare the sales or profit
month by month, or year over year
in order to uncover seasonility or
cyclical patterns. Now let's go and understand
how this function works. All right, now in order to understand how the
calculation works, we're going to have the
following examples where we have the sales of mans in
the calculations. Let's say that we are
currently at the months, may the current value
going to be this value? And for Tableau, in order
to create the difference, it needs always two data points, the first one always going
to be the current value, in this example going
to be the current sales of my second data point. Here we have more
freedom where we can select which value going to be compared to the
current value. In Tableau, we have
four different options. The first one, we
can go and compare the current month with
the previous month. In this example, we
can compare the y with apprel if you
define it like this, with the previous
Tableau going to go. And simply find the differences
between the current and the previous Tableau going to go and just subtract
those two values. This is the first option. The second option
that we have is to compare the current value
with the next month. In this example, we're going
to compare the month of May, the current one, with
the months of June, Tableau going to
go and simply find the differences between the
current and the next month, and it's going to go and
subtract the values. Now moving on to
the third option. We can compare the current
month with the first month, the first value that we
have inside the staple. That means in this
example, if we define for Tableau the first, that means Tableau
going to go and find the differences between
the current sales, that will be the sales
of Y with the first, so we have it as January, and then go and
subtract the values. Now moving on to the last one, I think you already got it. We're going to compare
the current month, the M, with the last month, the month of December, Tableau going to go and find the differences between
the current value of my with the last value inside the visualizations
of December. So it's going to go and
subtract the two values. As you can see, we have here four different options on which value we are
comparing with the current, either the previous value, the next value, the first value, or the last value. That means in
Tableau we get like really great control which data points can be
compared to each other's. Now let's go back
to Tableau in order to start practicing
for this function. All right everyone.
So now we're going to go and create a
view in order to compare the sales over
the time, over the years. We're going to go with
a big data source. Let's go to the orders, the order date to the columns
to have the years. Then we would like
to have the rows, the Ns and the quarter hold control and just
duplicate it like twice. The first one going
to be the quarter. Let's change the format to quarter and the second one is
going to be for the month. We're going to replace
it as well to the month. Now, I would like to make
the tip a little bit bigger. I'm just going to
stretch it from the rows and as well
from the columns. Now what is missing? Of
course, our measure. Let's go and get the sales
and put it in the view. Now we have the
sales aggregated by the months and
spreaded by the years. Now we have to create
the differences between those years. In order to do that,
we're going to go to our measure Radicallickit. This time we're going
to go use this option. More control on the calculation. Add table calculation.
Let's do that. Now we have to
configure a few stuff. First, we have to choose
the calculation type. It's going to be the
difference from, as a default is correct as well. Computing use which scope, which direction we want. We want the direction
from left to right. We want to compare the years
which is currently correct. We don't want to compare
the months together. If we want to compare that, we can switch it to table down. With that, we are now
comparing the months together, but now we want to
compare the years. In order to do that, let's
select the table across. And then we have to specify
for Tableau relative to. And here we have
to define one of the four options that
we learned before. We have the previous,
next, first, and last. Now in this example, we want to compare current year
with the previous year. So we're going to stay
with the previous. So that means, for example, let's pick this
value of our year. It's going to be the differences
between the sales of 2022 January and the year
before with the same month. So it's going to be the
difference between this year and the year of 2021, January. And that's why for the
whole year of 2018, we don't have any values. Because in this view
we don't have 2017, we don't have a previous year. It's going to be the first year, that's why it's
completely empty. All right, so that
we have created the table calculations. But as usual, we're going
to go and change the view that we are currently
presenting for the users. So what I would do now, I would reduce the number
of years to only two years. So let's go and apply a
filter. Show filters. And I would pick
the last two years. Now I would like
to add to the view the total sales for each month. In order to do
that, let's go and grab the sales and
attribute to the view. Now on the left side we have
the differences in sales, and then we have the
aggregate of sales. Now we can see very easily
where those numbers come from. It is the differences
between those two years. All right, the next step,
let's go and replace those numbers with
visuals, with pars. In order to do that,
we're going to take our measures and put
it on the columns. This is the first
and the second. Then let's change the visual. Instead of line to par, let's go to the marks
over here and say we would like to have the
bars. All right here. As you can see, all the measures having the same coloring. Instead of that, I would like to change the coloring
of the differences. Let's go to the sum
of Sales over here. As you can see, we have the
icon of table calculations. And then let's drag
and drop the sum of the table calculations to the
color by holding control. Let's change the colors
of the first measure. So let's switch
the sum of sales, the aggregations, and
go to the colors. And let's pick any
color from you. Like for example, the
blue, those informations, Gus, from the total sales, from the aggregate calculations. And this one comes from
the table calculations. And it's very simple to create. And with that, we can go and compare the years for the sales. Now if you would like to
analyze the differences between those two years,
you can see in January, for example, there's
no big difference between year 2020, 1.20, 22. There is like small growth. But if you go, for
example, to February, you can see there
are big differences between the two years we have made a lot of sales
in this month. And another thing to notice
here is that in November, we made less sales
than the year before. So as you can see, we can very quickly find the differences between those sales in 2022 and the sales
of the year before. So this is the power of
the difference function. It can help us to compare
two things like the years, or maybe the categories
month and so on. All right, so that's all
for the difference function in Tableau. All right everyone. So that's all we have covered. The four types of
Tableau calculations. And with that, you have learned around 60 different functions
in Tableau so that you have enough tools in order
to create new fields in your data source and as well
to manipulate your data. And with that, we have completed the section Tableau
calculations. And now in the next section, things going to get really interesting where
we're going to go and build around 63 Tableau charts. We're going to start
with the basic charts like par charts, and we're going to progress to more complex charts in Tableau.
154. Tableau | Section: Tableau Charts: Jump immediately by start
building charts in Tableau. And we're going to
cover around 63 charts. So let's have a sneak peek at some visualizations and charts that's going to be
covered in this course. You will start by creating
some basic charts, like different part charts, we have column draws,
stack part charts. And then after that,
you're going to learn how to create different line charts. And as well we're going
to have area charts. And then we're going
to learn how to combine different
type of charts, like for example, a par
chart and a line charts. And moving on, we will be creating different
maps in Tableau. And then you will go to the next level where
you're going to start building charts
like scatter plots, slobby charts, parple charts, poly charts, calendar charts. Then after that, we're going
to go to the last level, to the advanced charts. For example, we
have reto charts, waterfall butterfly
or tornado charts, Quardont charts
and funnel charts. So as you can see, we're
going to cover a lot of Tableau charts and
visualizations in this course. So now let's jump
in and get started.
155. Tableau | Multiple Measures in One View: Before start learning how
to build charts in Tableau, we have to understand
some basics. Like for example, how to add multiple measures
in one single view. I saw many new Tableau
developers that they get confused on how to add a second measure to
the visualization. Because in Tableau we
have different places and different methods on how to add multiple measures
in one single view. Here in Tableau we
have three methods. The first one is to use
individual axes for each measure. The second method is to use one single shared axis using measure values
and measure names. And the third one is to
use dual axis in Tableau. So now we're going
to go and learn those methods step by step, and we're going to learn
as well the advantages and disadvantages
of each methods. So let's go all right guys, now we're going to start
with the first methods. We have the individual
axis for each measure. So let's see how we can create it and how it's
going to look like. Let's go, for example,
to our big data source. Let's pick the order
date to the columns. And now in order to create individual axes
for each measure, we're going to drag and drop the measures in the
rows or in the columns. So for example,
we're going to take the sales and put
it in the rows. And let's take as
well the profits, drag and drop it to
the rose as well. Now we can see in our view that each measure has its own axis, That's why we call it individual
axis for each measure. We can see for the
sales we have this axis that starts 0-1 million. And for the profit
it starts 0-100 k. And those two axes for those two measures are completely separated
from each other's. There is no overlapping
or anything. Now, of course we have
two measures we can go and add a third,
fourth, and so on. So there is no limitations on how many measures we can
add to our visualizations. We can see now we
have four measures. You can see each
of those measures has different axes
with different ranges. I would like to
understand something very important in Tableau that once you are adding
multiple measures to the views, you will get multiple
pages on the marks. The marks in Tableau is the
place where you can go and customize the visualizations to customize the charts
that we have over here. In our view, since we
have multiple measures, we will get multiple
pages in the marks. Let's check what
we have over here. We have the first one is all. Then we have an
individual mark for each measure that we
have inside our view. Now let's understand
how this works. Let's start with the
first one, the all. Now in this page, anything
that you change in the set up, it can be reflected for all
measures, for all charts. For example, instead
of having the line, I would like to have the P. But now if I change it to bar, as you can see, all the measures can be changed to bar charts. Or if you go over
here, for example, to the colors and
change it to black, you can see that
all our measures now are black and so on. If you go to the size,
reduce the size, you can see the size of all our measures is
going to be reduced. So anything that I'm
changing in the, all it can be reflected for
all measures in the view. But now since we have individual
axis for each measures, we can go and customize each of those charts
individually. So for example, let's say that I would like to change
only the sales. I can go to the Marks
of Sales over here. So let's switch to the
page of Sum of Sales, and then instead of having bar, I would like to
have it as a line. So now we can see
we have changed the chart type only
for the sales. Everything else can
stay as a bar charts. And the same thing
for the profit. You can go over here to
the profits and say, okay. Instead of plaque, I
would like to have it. For example, as blue
as you can see, this customization is, can be
done only for this measure. Only for the profits. And then the same thing
for the other measures. If you say okay
for the quantity, I would like to change
the chart type instead of let's go for
something like area. So let's switch the quantity and then let's go to
the area over here. With that, we have
changed only the chart type for the quantity, so you can see those
marks are really helpful in order to
customize our charts. And you can go and do that
individually for each measure, or you can go to all measures
over here and then do the changes for all measures together. That's
all for the marks. They are really
important in order to customize the charts
inside our visualizations. One more thing that's
important to understand, that we have here four tabs inside the marks
because we have four measures. Well, because we have
continuous measures, For example, for the years, we don't have any
tab in order to customize the years
because it is discrete. For example, let's go
and switch the sum of sales from continuous
measures to discrete. Rat click on it and
go to discrete. With that, you can
see that the sum of sales disappear from the marks. That means we
cannot customize it anymore because it is discrete. Let's go and change it
again, back to continuous. And with that, we're going to
get it again in the marks, you can customize
continuous fields. All right guys, now as you
can see for these methods, we can go and customize our charts individually
and as we want. And another advantage
that we can go and add as many measures as we want
inside our visualizations, but the disadvantage that
we have separated axis, which is in some situations
it's really hard to compare the measures together if they are like
splitted like this. That's why we have tablo
different methods in order to combine and to merge the axis
and the charts together. So that's all for the first
methods where we're going to have individual
axis for each measure. All right guys, moving
on to another method in order to combine multiple
measures in one view. And that is by sharing
the same axis. We can do that using the measure names
and measure values. If you take the data pain in
each data source in Tableau, you will find always two fields. We will have always measure
names and measure values. Those two fields, the
measure names and values, they are automatically
generated from Tableau. They don't come from
the original source of your data. What
are those fields? The measure names is a
discrete dimension that contains the names of all measures that you have
inside your data source. On the other hand, we
have the measure values. It is continuous measure
that contains the values, all measures that
you have inside your data source in Tableau. There are two ways in order to use the measure
names and values. The first one is by simply just drag and drop from the
database into the view. Let's take, for example, the
measured names to the rows. As you can see, currently
no measure values are selected because we don't
have anything in the view. Now what we're going to
do, we're going to go to the major values and less drag and drop it to
the text over here. And now you can see in the view all our measures that you
have inside our data source. The count of customers, count of orders, discounts, profits, sales and so on. So those are all
available measures that Tableau can find
inside your data source. Again, the major name going to be the name of the measure, the count of customers,
count of orders. Those information comes from measure names. And the values of those measures going to come
from the measure values. So as you can see,
it's very simple. The names of the measures, the count of customers,
discount and profit. Those names comes from
the measure names. And the values that
we have inside this view comes from
the measure values. So here you can control stuff. For example, you
can go and remove any measure that you don't
want to see inside our view. For example, let's go and
remove the sum of unit price. So just drag and drop
it somewhere outside. And as you can see, Tablelated
immediately filter. So if you go over here on
the filters and edit it, you will see a list of
all measures that we have inside our data
source as well. If you want to remove
some measures, you can go and deactivate or deselect the measures that
you don't want to see inside our view. Let's
go and hit okay. And with that, we have reduced the number of measures
inside the view to 4.1. More thing that we can do
over here that we can go and change the sort of the
measures inside our view. For example, let's
take the count of customers from the
top and put it in the bottom so you can
see we just change the order of the measures
inside the view. All right, so this
is one way in order to use the measure
names and measure values inside the visualizations by just drag and drop
them inside the view. But there is like another
quick way in order to use those informations.
Let me show you what I mean. I'm just going to go and
remove everything from our view and then
start from scratch. Let's take the order
data to the columns. And let's take, for example,
the sales to the rows. So far we have only one measure. In our view,
everything as normal. But now let's say
that I would like to add another measure to the view before we learn that we take the profit and put
it near the seals. But with that, we have
learned that Tableau going to go and create two
individual axis. We don't want that, so
let me just remove it. I would like to have one axis
for both of the measures. In order to do that, we can use the measure
values and names. And in order to
quickly generate that, let's take the profits
now Very slowly, let's just drag it to
the axis of the seals. And as you can see
now, Tableau going to show us two green
vertical lines. With that, we are
telling Tableau I would like to
share the same axis for two different measures. So let's just drop
it on the axis. And here table going
to go and convert everything so we don't
have anymore here. The sum of sales, we have now the measure values and in the filters we have
the measure names. Inside it we will get only
two measures and the sales. So as you can see, table can
prepare everything for us. And this is a quick
way in order to use multiple measures using the measure values
and measure names. And we can see as well here in the measure values as we have
only those two measures. So now let's check the visual. As you can see, we have only
one axis for two measures. The green one going
to be the sales, and the grey one
can be the profits. So that means those two measures are sharing the same axis. And of course, we can go and add more measures to our view. Only two we can take, for
example, the discounts. We can go and drop it inside the measure values to the
last one for example. And with that we
got three lines. Three measures are
sharing the same axis. It's really nice and
compact way in order to compare multiple measures
using the same axis. But of course you
have to pay attention to the scale of the axis. For example, the
scale of the sales. As you can see, the green one is really huge, 0-1 million. Now if you take the
discount, as you can see, everything like almost zero, because the scale compared
to the sales is very small. That's why for this methods, it's makes sense to use
multiple measures in the same axis if they have
a similar scale of data. But if there is like big
difference in the scales, the visual will not make sense compare two measures.
So in this example, it doesn't really make
sense to use the discount inside these
visualizations because we cannot really compare it. It has really small scale. One more disadvantage
of this method of that, if you check the
marks over here, you can see that we have
only one tab for everything. We don't have individual
marks for each measure. And that means we cannot go and customize each
measure as we want. Like we saw before in the method one where we want to
use, in one case, for example, the line
diagram and another measure, we can use the bar
diagram and so on. So we cannot go and customize
individually each measure, but instead all
those measures are sharing the same set up
for the visualizations. That means let's
go, for example, and go and change the sides. If we do that, it's
going to affect all measures inside
the view and I cannot. Change it individually. So everything that you are making here or
changing individual, it can affect all the measures. For example, let's go and change it to par diagram and so on. The only thing that
you can go and customize is the colors. So if you go to the colors
over here and edit colors, you can assign for each measure a different value.
But that's all. We cannot go and customize
the charts as we want. So if you use measure values and measure names,
pay attention. You don't have the freedom of changing the visuals
of your charts, but it's still very useful in many cases where you want to have multiple measures
sharing the same single axis. All right, so with that,
I hope it's more clear. Now why do we have Tableau measure values and
measure names? All right, now moving
on to the last methods. In order to combine
multiple measures, in one view we can
use the dual axis. Dual axis are really great
way and very useful in many scenarios where you can go and compare two
measures together. Let's see how this
works in Tableau, and there are two ways on how to create dual axis in Tableau. The first one I'm going to show you now is that let's take, for example, the order
date to the columns. And then let's take the sales
in formations to the rows. Now I would like to get another
measure inside our view. So let's take the profit
and just put it in the rows side by
side near the sales. So here we are back to the
method one where we have two measures separated
with two individual axis. Now as you can see,
those two measures are separated from each other's. I would like to bring
those two visuals on top of each
other's how to do it. Let's go back to our measures. So yes, you can see
we have two measures, sales and the profits. We're going to go to the profit, to the one on the right
side, right to click on it. And here we have the
option of dual axis. So let's go and click on that. Now as you can see, those
two charts now are on top of each other's
using dual axis. The axis for the sales and the axis of the
profits side by side. And we can see as well the shape of those measures, the change. So now, instead of
having two green pills, we have now one green
pill from two measures, the sales and the profits. And now, if you check the
scales of those dual axes, you can see that
the sales as usual, 0-1 million and the profits 0-100 k. So now here
we have two options. Either you can leave it as it is with two different scales, or you can go and make them
similar to each other's. And this is what we do
in most situations. We go and synchronize
those two axes. In order to do that, let's go to the profit over
here on this axis. Right click on it,
and here we have the option of synchronize axis. Let's go and select that.
As you can see now, the profit scale has exactly
the same scale as the seals. It starts 0-1 million and the marked or the visual did adjust
as well to the new scale. So as you can see, now
we have it on the bottom before we had it near the seals. Now you might ask,
you know what, Why do you use dual axis? I can just go and use the
Mejor values like the method two and I can add as many
measures as I want to the view. So why do we have dual axis? Well, there's two
reasons for us. First, here you have
the option to decide whether you want to
synchronize the axis or not. So if you go to the method
one with the Mejor values, you can see that everything
is synchronized and you have only one axis and we
cannot change that. But if we go back
to the dual axis, we have always the option
to synchronize axis or not. So this is one benefit, the major benefit of
dual axis that I can go now and customize
each measure as I want. So if you check the
marks we have here, again, a tab for each measure. Again, the all going to
customize both of the measures. But if you go to
the Sum of Sales, we can go and decide the
visual set up of this measure. For example, I can go over
here and change the size. Or I can go to the sum of profits and say instead
of the line diagram, I would like to
get a bar diagram. Here is exactly the
advantage of the dual axis, where we can go and
customize the chart or the measures individually but
still using the same axis. And you don't have this
option if you are using the measure values
because you have to make a decision or a set
up for all measures. But the disadvantage
here that it's dual axis or only two measures, but it's still a
great way in order to compare two
measures in Tableau. I would like to show you now
the second method on how to create quickly dual
axis in Tableau. So let's go and
remove those stuff, and then let's take
again the seals. Now for the second measure, instead of dragging and dropping
it here near the seals. And then switch it to dual. What we're going to do, we're going to go to
the visual over here. And if you move it
to the right side, you can see that we have
one vertical line here. Be careful. If you move it to the axis, you have two vertical
lines where you can have the measure values
and measure names. We don't want that,
We want a dual axis, so just move it to
the right side, the opposite side of the axis. And you can see we have one vertical green line
if you drop it, Tableau going to go and create immediately dual axis
between those two measures. So this is how you can create dual axis in Tableau quickly. And one last point about
the dual axis is to understand the order of the measures has an
effect on the visual. So let me show you what I mean. I'm going to go now to
the profit and change it from bar diagram
to line diagram. And as you can see,
the red line from the profit is like in
front of the sales. So that means the measure
sales is in the back. And The profit is in the front. If you want to switch
that individual, what you're going to do,
you're just going to switch the order
of the dual axis. If we take the sales from left and just put
it on the right. And as you can see now
the part diagram in the front and the line
diagram in the background, which in this situation
it's not really cool to have the line
behind the parts. Now let's go and switch
it again so the profit on the right side so
that we're going to get it in the front and
the sales in the back. All right, that's all
for the dual axis. Now of course in Tableau,
you can go and mix all those methods
together in single view. Here we have a dual axis. In this example,
I can go now and add the measure
values, the profit. Instead of having the profits, we can have the measure
values, the method two. In order to do that, let's take, for example, the quantity. And let's drag and drop it
on the axis of the profit. Let's drop it over here. And as you can see,
table immediately switch the sum of profit
to measure values. But still on the left
side we have sales. Now we are doing a
dual axis between the sales and a
bunch of measures. Now we can go and
add more measures to the measure values. Let's take the unit price
and add it over here. We can add the discounts. But now let's just change the colors in order
to make clear. Now I am at the tab
of the Major Values. Click on the Colors colors. Now the quantity, I'm going
to give it green unit price. Let's give it gray
discount this color. And that's all. That.
As you can see, we have different lines,
but all of them are lines. We cannot change that
because it is a major value. So all of them are
sharing the same set up. And on the background we have the sum of sales
from the dual axis. That means you can go and
combine those stuff and of course we can go and
add the method one. Let's take the count of the orders and just drag and
drop it to the roads over here so that you can see that
Tableau did go and create an individual axis for
the counts of orders. That means if you look now to
our measures, in this view, the first one, the sum of sales, we are using the dual axis. This part diagram, the blue one. And then on the right
side of the dual axis we have punch or
bundle of measures. Here we have the sum of profit, quantity, unit
price, and discount. So we have a group of
measures as a part of the dual axis using the measure
values, count of order. It is completely separated and not sharing the axis
with the others. We have it as an individual
axis using the method one. All right, so as you can
see, you can mix the stuff. And this is exactly
the power of Tableau, where we have high
customizations on how to visual our data. All right everyone. Now let's
have a quick summarize. In order to combine multiple
measures in single view, in single visualizations
in Tableau, we have three methods. The first one is to
use individual axis. That means we're going
to have for each measure a different separated
independent axis. And the advantage of
these method dots, we can go for each measure
and decide about the visuals, which visual type we can use, the colors, the
sizing, and so on. So the customizing of the measures is going
to be independently. And the second benefits, we can go and add
as many measures as we want inside one view. But the weak point
in this method, it's really hard to compare
those measures together. That's why we have
the second methods where we can go and compare all those measures together using one shared or single axis. And we can create
such a visualizations using the measure names
and the measure values. So we have only one
axis and we can have multiple measures
sharing the same axis. With the main benefit
of our thoughts, we can add as many
measures as we want. And as well we can
compare those measures better than the method one since they share
the same axis. But the disadvantage in
this method A thoughts, we cannot go and customize each of those measures
independently. So that means all those
measures are going to share the same configurations
of the visualization. So we cannot use here a line then apart and change
something else. We have always to use the same visualizations
for all measures. And that's why we have
the third method in table to use the dual axis. The main benefit of
the dual axis of dots, we can compare two measures
closely to each other's. We can define whether we can
synchronize the axis or not. And here, the advantage
compared to the previous one, the single axis, the dots, we can customize the visuals for each measures independently. So here we have a line diagram together with a bar diagram. Only this advantage of
this method of dots, we can compare
only two measures. All right, Kay, so that was the different methods on how to add multiple measures in one single view and
when to use them. Next we're going to start
building basic charts, and first we can
have the bar charts.
156. Tableau | Bar Charts: All right, so now we're going to start with the easy stuff where we're going to build
a bar chart in rows. Let's start with the
big data source and let's take the
subcategory to the rows. And then we need to measure, let's take the seals and
put it in the columns. With that, we got the
sales by category. Now in order to make it bigger, I'm just going to go over here. Instead of standards, let's
take the entire view. Now as you can see, we
have bars in the rows. Table can use Bar
chart as a default, but in case you have
something else, you can go to the Marks over
here instead of Automatic. You can move it to bar,
let's go and click on that. Nothing going to change because currently is a bar charts and we usually use the par charts and rows in order to make ranking. In order to do that, let's go to the sales and sort our data. With that, we've got a very
nice ranking in our charts. One more thing that I
usually add is the coloring. So I take the measure,
the sum of sales hold control and put
it on the colors. That's all for the
bar charts and rows. Okay. The next we have the
bar charts in columns. It's very easy and very
similar to the rows, I just duplicated
the worksheets. Now here, instead of having
the dimension on the rows, we have to move it
to the columns. We have to switch between the
measure and the dimension. In order to do that,
it's very simple. Let's go to the Quick
menu over here and just switch it that
we got the parts. Now on the columns, as you can see,
it's very simple. We usually use this as well
for ranking, of course. Now the question is when to use columns and when to use rows. If you have a dimensions
with low cardinality, like we have the subcategory,
you can go and use. But if your dimension has a high cadrety, a lot of values, you can go and use the rows in order to have like a long
list and you can scroll down. It's always better
to scroll down than to scroll to
the right sides. If you have a lot of values
inside your dimension, go with the par rows. But if you have low number of values inside your dimension, go with the column bars. All right, moving on
to another part chart. We have the side by side bars. In the previous part charts, we have used only one dimension. This time we're going to
go and use two dimensions. Let's go and build it.
First I would like to get the dimension
country to the columns. And then let's go and get
our measure the cells to the rows that we got
the normal part charts. But now if you go and add another dimension
to the columns, you will get side by
side part charts. The second dimension going to be the years of order dates. Drag and drop the order
dates to the columns. As you can see, Tableau
converted to line charts. We don't want that,
we want part charts. That's why we go to
the Marks over here. And instead of Automatic, we're going to
switch it to bars. Again, here I would like
to make it entire view. Now we have a lot of
data inside the view. We have five years of data. I would like to have
only two values. I would like to
compare the last, let's drag the years
to the filters. Then I'm going to
filter using the years. Select the years next, let's have only the
last two years. Click. Okay. The last thing that I would like to add
is the coloring. Since we have two years, I would like to have for
each year a color. Let's take the years, hold control and put it on the
colors, and that's it. We have now really nice
separations between the values. Now as you can see,
we've got side by side bars and it's really useful in order to compare multiple
values in each category. With that, we can
really easily compare the last two years
in each country. Here in this type of charts, try to not have a lot of data, then it's going to be
really hard to compare. You can see we just
have a filter on the data in order to compare
only the last two years. That's it for the
side by side charts. All right, moving
on to the next one, we have the bar chart over time. It's a very famous one. You can find it almost in
each dashboard. So let's see how we're
going to build it. We're going to go
to the order dates, let's put it on the
columns as usual. We're going to have the years. Let's go and get our measure, the sales, and put
it in the rows. Here has a default tablet
going to show it as a line. Let's go and switch
it to the bars. Since we are working
on the bar charts that we got very nicely
the sales over the years, but we usually add more details because those
data are very aggregated. Let's go and add
another dimension. In order to do that, let's
just drill down the years. Click on this sign and with that we got the second
dimension, the quarter. And here we can see
more details about how the sales are
changing over time. The main use case of this
part chart is to show how the data are changing
over time to show trends. If you have such a requirement, go with the part
chart over time. Okay, moving on to the next one, we have the stacked part charts. The requirement for
this one is going to be similar to the side by side. We can use two
different dimensions. Now let's go and build it. I would like to see
the total sales of each month for this year. In order to do that, let's take the order date to the columns, and let's take the
sales to the rows. Now I'm going to go and switch the years to months, right? Click on it. And let's select
the formats, the month, so that we got those parts that represent the total sales for
each month and this year. But now I'd like to add
more information to this view in order to compare
as well the categories. Now let's go and
get the categories. Is always the question where
we're going to place it. If you put it on the columns, what you're going to get, you
will get side by side bars. We don't want that, we want
to get stacked charts. In order to do that, let's
take the category and put it just on the colors.
Let's go and do that. And with that, we got
this information, this dimension as a
color inside each bar. And with that, we're going to have the stacked bar charts. Now as you can see,
the main purpose of the stacked par chart is first to have the total
of sales over the time. We can compare the
months and how the sales are developing
over the time. Then the second task, which is not the main task, is to go and compare the
categories to see how the category contributing in the total sales of each month. That's all for the
stacked part charts. All right, now we have a very similar chart to
the previous one. We have the full
stacked part chart, or sometimes we call it
100% stacked part charts. Now I just ublicated
the previous one, and as you can see in the
normal stacked part charts, each part starts and ends differently
from month to month. Total sales is naturally
important in these charts. What is important is now to compare the subcategories
over the time. A very nice way in order to do that is to have
full stacked part. That means each part in our visualizations can
have exactly same length, and it starts from 0% to
100% In order to do that, let's go to the Sum of
Sales, Right click on it. And then let's go to the
quick table calculations. And have the percent
of total that we got, the percent of total instead of the total sales as a value. But we're still
not there because those parts are not
having the same length. In order to do that, let's
go back to the Sum of Sales. Right click Ont and let's go to Edit Table Calculations.
Let's go inside. Now what we're going
to do over here, instead of having table across, we can have specific dimension. Let's go and switch on that. And we're going to select
only the category. Since we are focusing
only in the category, let's remove month
of the ordered age. Now as you can see,
we get immediately a full stack. Let's
go and close this. Now as you can see,
all those parts has exactly the same length
and they all start with a 0% and end up with 100% We call this type of
chart as a part to whole. That means I would like
to see and understand how each category are relate to the whole sales
of each month. Now let's quickly summarize
when to use which chart. If you want to focus on comparing the categories
over the times, then go with the full
100% stacked part charts. But if it's more important to
show the total each month, then compare the categories, then go with the normal
stacked bar charts. All right, moving on to
the last type of bars, we have the small
multiple bar charts. Many bar charts inside
our visualizations. And we can do that by adding
more than two dimensions. Let's start for the
first dimension. We're going to go to the
countries from the data pane, let's put it in the columns. And with that, we got the values of the countries as columns. I would like now to add
rows from the category. Let's get the second dimension, the categories to the rows. Now I would like to
fill those informations in order to see some data. Let's go and get our measures, The sales, drag and drop
it to the rows over here. As you can see, our bars
are not really small still. We have big parts inside our
view and always we can go and check how many marks or how many parts do we
have inside our view. By checking this
information over here, we can see that
we have 12 marks. Now let's go and get
our third dimension. It's going to be the order date. Let's get the order
date to the columns. Now we went 12-16 marks
or 16 data points. Now Tableau switch it to lines. I would like to bring
it back to pars. Let's go to the Marks,
switch it to pars, but still our parts are
not really mini or small. In order to go more in
details inside our view, instead of using the years, we're going to go
with the month. Let's go and change the format. Right click on it. And
let's choose this format, the continuous one, the month. So now if you check
again, we went 60-707 marks mini
bars inside our view. I would like to add as
well some coloring to it. Let's go and get the country
to the colors. So that's it. With that, we got small
multiple bar charts. As you can see, as you are adding more dimensions
to the view, you are splitting the measure
to more and more details.
157. Tableau | Bar-in-Bar Chart: Okay, next we have
the bar in bar chart. Previously we have compared two dimensions inside our view, but now how about to compare two measures in our
views using pars? Let's see how we can do that. As usual, we're going to take our subcategory to the rows, and then let's take
the first measure. It's going to be the
seals to the columns. With that, we got our
standard bar charts. Let's go and sort
it by the sales. Now we need our second measure. Let's go and take the quantity and put it as well
in the columns. Now with that, we got individual
axis for each measures, and we can go and
compare the data. But it's way more
better if you have two measures and
you want to compare them is to use the dual axis. As we learned before in
the previous material. Let's go and use the dual axis. We're going to go to the
quantity erratically connect and let's go
to the dual axis. Now here, Tableau did
decide to go with other visualizations
since we have automatic. Instead of that, I would like
to switch it back to bars. As you know in the dual axis, we will get different
tabs inside our marks. Now, since both of
them going to be bars, we're going to go
to all and then select instead of Automatic, we're going to have the bars. But now as you can see,
we are not there yet. It's like the stacked part, but actually it's not stacked. In order to change that,
what we're going to do, we're going to go for
each individual measure and change the set up. But first, I would like
to change the coloring. I don't like those
current informations, so let's go to the
quantity, Make it orange. The sale is going to
be blue. Let's okay. Now what we're going to do
in order to have bar in bar, we're going to go and change
the size of the quantity. Let's go to the
quantity over here, go to the size and just make
it a little bit smaller. So now we can see
in the background the big blue bar, and in the front we have this
small orange bar. So with that we got something
like bar in bar chart, which is really
great in order to compare two measures
using dual axis. If, for example, if you
check the category art, you can see the quantity
is really huge. But we are generating
very few sales compared, for example, to the cubres. We have less quantity
that is ordered, but we have huge sales. So it's really nice way in
order to compare measures.
158. Tableau | Barcode Chart: All right, the next
one can be fun. One where we're going to
create barcode charts. We usually use it in order to show more details
inside each par. So let's see how we can do that. As usual, we're going to
get the same information, subcategories to the rows
and sales to the columns. I think you already got
it. Let's go and sort it. Now what I would
like to bring is a dimension with
high cardinality, like the product name. Let's go and bring
it, for example, to the rows over here. As you can see,
Tableau is warning us and telling us
there's a lot of members inside the product name. And now if you go and say, okay, add all members,
what can happen? The view going to be broken and it's not really informative. But instead of that, we can take the product name and
put it on the details. So let's go and do that. And now with that we
have built something like bar codes where we have the product informations
inside each pars, which is sometimes
useful to show all those details in one view. So that's how you
build barcode charts.
159. Tableau | Line Charts: All right, so now we
can start talking about the line
charts in Tableau. They are very basics and very standards in order to
show the change over time. Now let's go and build very simple line charts in Tableau. Since we are saying
change over time, that means we need a date. Let's go and get the order
dates to the columns. And then the roads, we need
our measure, Sum of Sales. Now as a default, as usual, Tableau going to show the years. But instead of that, in order to make it
more interesting, we're going to go and
switch it to months. Let's go and change the format to month continuous,
so click on that. Now with that, we
got our line charts. If for some reason at your end you are not
getting a line charts, in order to switch
to line charts, we go to the marks and
then instead of automatic, let's go and choose the line. Once you do that, you will get exactly like by
me, a line chart. This is the most
basic line chart in Tableau that shows the
changes over time. Okay, next I would
like to show you the different visuals that
we can add to our line. For that, let's get more
measures to our view. Currently we have
the sum of sales. Let's get everything
like the discount, the profits, ad sales. Let's take the unit price
and as well the orders. Now as you know, since we have
five measures in our view, we get as well five tabs
in the marks in order to individually set up the
visual for the sum of sales. We're going to leave it as it is as a standard line charts. But for the next one,
what I'm going to do, we're going to change the path
or the visual of the line. If you go over here on
the pass and click on it, we will get different
types of lines. The first one going to
be the standard one, the linear, but the second
one going to be a step. Let's go and select dots. Now if you check the
discount over here, we don't have a
linear chart like the sales we have now like
steps like it's jump up, then we have steps down. All right, so let's move next
to the profit over here. So let's switch the
tab to the profit. Now we're going to go
again to the path. And here we have two sections, the line type and
the line pattern. In the line pattern, we have the solid line or we
can make a dashed line. Let's go and select
the dash line. And as you can see
now individuals, we have very nicely a
dash line in Tableau. So this is one more way in order to present the lines in Tableau. Let's move to the next
one, to the next measure, we have the unit price.
Let's switch there. Now what we can do over here, for each points that
we have in the charts, we can make a marker or like small circle in order
to add the markers. What we're going to
do, we're going to go to the colors over here, and then here we
have the effects. The first one is automatic. The second one to have marks, and the last one
to have no marks. Let's go and switch
everything to marks. Now with that, you can
see the line chart in the Enterprise has like small
circles, small data points. This is one more visual effect
on the lines in Tableau. Let's move to the last one, the count of the orders.
Let's switch there. Now what we can
do, we can change the size of the lines
depends on the values. In order to do that, let's
take the account of orders. So it's control drag and drop
it and put it on the side. So now if you take
the last line, we're going to see a
really nice effect. If the values are small, we will have a thin line. But if the values are high, we will get like a heavy line, which is really looks nice. All right guys. So
as you can see, Tableau is very rich in the
visualizations and with few clicks we can change the visual representations
of the lines. All right, now we're
going to build the multiple line
chart in Tableau. I'm always duplicating
the sheets in order not to build everything
from scratch each time. So now previously in
the standard line, we can see the
changes over time, but sometimes we want to
add more information. We want to compare the values of one dimensions inside this view. And we can do that by
having multiple lines. Let's say that I would like to compare the values
inside the category. Let's go to the categories
in our Bod, Arts, and now let's put
it on the colors, drag and drop it to the colors. And as you can see by doing that table going to go and plot three lines for each value
inside this dimension. With that, we got multiple
lines inside one view. And now we can see that
it's not really informative because we have a lot of
lines and a lot of zigzacs. In order to reduce
that, we're going to switch the format to, let's say for
example, a quarter. Now it's a little bit more
clean in order to see the data are changing over
time and you can compare the values
of one dimensions, the number of lines
really depend on the values inside
this dimension. One more thing about how to
create those three lines. You don't have to have
it always on the colors. If you move the category from the colors and put
it on details, you're going to get the
same effects where Tableau going to go and create
multiple lines for each value, but this time without colors. This is another method on how to create different
lines in Tableau. But I think it makes
more sense to have it on the colors to have subarate
color for each line. This is how we can
create multiple lines in Tableau using dimension. All right, the next
one, we can have dual line charts. This
time we're going. Compare two different
measures in one view. So we're going to create
for each measure, one line. So now I'm going
to stick with the same view where we have the sum of sales and the quarter
for the order date. Now we'd like to
compare, in this view, two measures, the sum of
sales and the profit. Let's take the profit and put it side by side by the sales. And with that, we've got two different lines
for each measure, But I would like to have
it on top of each other's. In order to do that,
we're going to go and use the dual axis. Let's go to the Bf,
right click on it, and here we have the
option of dual axis. So as you can see,
it's very simple. We've got a dual line charts, and here you can add more stuff. For example, you can
go and synchronize those two axis by going to the
brofit, right click on it. And here you can go
and synchronize it. Or of course we can go and
set up each line differently. So let's go to the
profit over here, go to the path and let's
make it a dashed line. As we learned brieviously,
using the dual axis, we got the freedom of changing the visual of each
measures individually. And this is a
really great way in order to compare two measures. Okay, moving on to the next one, we have the cumulative
line charts. So currently in the
standard line charts, we are using the month
and the sum of sales. And we can see the total
sales for each month. But sometimes we would
like to understand how the thing are developing
or growing with the time. Now we want to see
the growth over time. We have to use a
cumulative line charts. In order to do that, we're going to go to the Sum of Sales. And instead of having sum of Sales as aggregate functions, we're going to go and create quick table calculations to have the running total. Let's
go and switch that. And as you can see,
we're going to get very nicely cumulative
line charts where you can see how the thing are
growing over the time. But of course, to make
things more interesting, we're going to add more
information to our view. Let's go and get the category and generate different lines. So we can drop it on the
colors and now we can see how the different categories
are growing over the time. Add as well to the
cumulative line is the ending point
of each line. In order to do that, we're
going to go to the Marks, to the labels, click on the
labels, show mark labels. But as you can see, we have
for each month one label. We don't want that, We want
only the ending of each line. In order to do that,
we're going to switch it from all to line end. Now if you check our lines, you can see at the start and at the end we have
this information. But the starting point is
not really interesting, so we can go and disable it. Label start of line. Let's go and disable it. With that, we're going
to have the total sales of each category at
the end of the line. With that, we can go and analyze the growth over time
for each category, Okay, So now we're
going to go and create small multiple line charts as we've done for
the bar charts. We're going to do it
now for the lines. Now what we're going to do,
we're going to bring like at least three dimensions
to the view in order to break down the sales to smaller lines. Let's
go and do that. We're going to get, as usual, the order date to our view. Let's get the sum of
sales to the rows. And then we can get
another dimension, the category to
the rows as well. As you can see now as we
are adding more dimensions, we are splitting the lines. Let's go and get the countries and put it as well
to the columns. So now that we've
got more charts, but table going to show it as bars since we have as automatic. So let's go and
switch it to lines. Now we have it as
a discrete line. Instead of that, let's
get a continuous line. In order to do that,
let's go to that date and switch it to something
like the month as continuous. Let's change the
formats with that. As you can see, we get very interesting
multiple line charts. I would like to add
the colors as well. Let's go and get the country, for example, and add
it to the colors. Now, just to enhance the visual, let's go and remove the grid. Right click over here. And then let's go to formats. Then we can go over
here to the lines, and then we have
the wrought tab. Let's go to the grid
lines and move to none that we have removed
those grid lines, which is really annoying
to have a lot of them. Then the last thing that
we can do with that, we can have the total
sales of the last point. In order to do that,
let's get the sum of sales hold control and
boot it to the labels. Then we're going to go
to the labels over here and let's select Mean Max. We're going to have
it by the order date. So let's switch from
Automatic to month. And let's have only
the maximum value. Let's remove the minimum value. So what that we've
got for each chart, like the total sales
for the last month. So that we have created very nice small multiple
line charts in Tableau.
160. Tableau | Highlighted Line Charts: All right, moving
on to the next one, we have the highlighted
line charts in Tableau. This is especially important if you have multiple lines in one single view and there's different methods
on how to do it. I'm going to show a quick
one and a professional one. Let's start with the quick one. Let's have multiple
lines in our charts. I'm going to take this
time, the country, and put it on the
colors that we got. One line for each value
inside the country dimension. And now I would like to
give the ability for the users to highlight
one of those values. In order to do that,
it's very simple. Go to the country over
here, right click on it. And let's go to the highlighter. Here we have the option
of show highlighter. Click on that. That, if
you check the right side, we're going to get smallpox. In order to highlight the
values inside the countries, the users can go
over here and select one of those values,
for example, Germany. And as you can see, Tableau going to go and
highlight the line of Germany and it can
applure all other lines. This is really nice way in
order to go and highlight different values in Tableau in order to focus on one value. This is really great
way in order to go and highlight one line, especially if we have a
lot of multiple lines. That's what it. This
is how you can create quickly a highlighted
line chart in Tableau. All right, so now we're going to talk about the second methods on how to create
highlighted line charts, but this time professionally. So now I just duplicated the
old line chart where we have the quarter sum of sales and
the countries on the colors. But this time we're going to
get rid of this highlighter. So I'm just going to
go and remove it. So now we have to
give the users a list of all countries in
order to select, and this selected country going to be highlighted
in the view. In order to do that,
we're going to go and create a parameter. Let's go to the data
Pain, write it, click over here, then
create a parameter here. We're going to give it
a name, select country. Since the country
values are string, the data type going to
be as well a string. Now next we're going to
go and create a list of all countries that we have
inside the dimensions. Here we have four
values. We have France. Be careful that we
have exact case. The first capitalized
and the rest is small. We have Germany, Italy, the last one is USA. That's it for our parameter. Let's go and hit okay that we've got our new parameter
on the left side, rightically connects and show parameter in order to see
it here on the right side. Now the users can go over here and select one of
those countries, but as you can see, nothing
is changing in the view because we haven't
connected yet to our view. Now, in order to
connect it to our view, we have to go and create
a new calculated field. Let's go to the data pin. Again, create calculated fields. Let's call it
Highlighted Country. And here we can have a very simple condition
where we're going to say country equal our parameter. So our Peter going to
be select country here. What we are saying is that
if the selected country from the parameters equals to
the value of the country, then we're going to have true. Otherwise it's
going to be false. For example, now
currently we have the value of France
selected in the parameter. That means the country, France, going to be true, and all
other countries can be false. Let's go and hit, Okay. So now we're going
to go and work highlighting the
selected country. In order to do that, let's
start with the coloring. Currently we have the
coloring on the country. I'm going to go and
move it to the details. That means now the countries
are just creating the lines, not responsible for the
coloring of the lines. Now, in order to
bring the coloring, we're going to get our
new calculated field, the highlighted country. And let's put it on the colors. Now we can see that we have only two colors because
we have false and true. If it's true, it's
going to be orange. If it's false, it's
going to be blue. But I would like to change those coloring to do the
highlight effect. Let's go to the colors, colors. False going to be gray
and the true going to be, let's say for example,
the blue, let's say. Okay, now we get like
a highlight effect. All other lines are gray and only the one that we select
is going to be blue. But now let's go and
test our parameters. We have here France
selected currently. Let's select Germany.
And as you can see, and as you can see
now that selected line going to be Germany. Let's stick Italy and USA. Now, as you can see, our
parameter now is working. Now here we have a
little bit issue where the highlighted line is
behind the gray lines. In order to switch that,
I would like to have the highlighted in the front
and the gray in the back. We're just going to go
to the legend over here. If you don't have it, you
can go to the analysis. And then here we
have the option of the legends and make sure
to select the colors. Currently it's selected by me. So what we're going
to do, we just going to switch
those two values. Let's take the
true and put it on top so that we have
sorted those two values. And as you can see
in the charts, the blue color in the front and the gray color in the back. Now the next step,
in order to create this highlight effect
in doubled dots, we're going to change the size. In order to do that,
we're going to use our new calculated field. So the highlighted line drag and draw it on the size
by holding control. Now with that, we've
got different size for the highlighted line
compared to the others. But here we have the
opposite effect, but we don't want that. We want the rest going to be thin and the highlight
going to be heavy. In order to do that, let's
go to the legend over here. Just doubl here. Now as you can see
that through a thin, the falls is heavy. In order to switch it, we're
going to go to reversed. Let's click on
that and hit okay. With that, you can see
the highlighted line is way heavier than the rest. You can change the size if
you don't like it like this. So we can reduce a little bit the sizing and it's going
to be now more nice. All right, so that's
all on how to create highlighted
line in Tableau more professionally than the
Brison where you have more control on the
sizing and the coloring. The users can go over here
and start changing the value. And with that we
are highlighting one line compared to
the others. That's it.
161. Tableau | Bump Chart: All right, next
we have a fun one where we're going to
build a pump chart using lines in order to do ranking between
different values. So now for example,
I would like to rank the countries over time. In order to do that we're going to have the
same view where we have the quarter and the
sales and we have a line. So now the first thing
that we're going to go and grab the country and put it on the colors in order to create those
different lines. Now since the analysis
is about ranking, not the total sales, in order to build
that, we're going to go to the sum of
sales over here. And we're going to go and create a quick table calculations. Here we have the rank function, so let's go and select that. So now we have a ranking that
depends on the whole table, on the whole view,
I don't want that. I would like to rank
between only four values. In order to do that, let's go to the Sum of
Sales over here. Write it. Click on
it, and let's edit Calculations. Let's go inside. And now instead of
having Table across, I'm going go and
specify a dimension. Now we would like
to have a ranking only using the country, so we're going to have
only four values. I'm just going to go as well
and select the order dates. Let's go and close this. Now we have some kind of
effect of the pump chart, but we are not there yet. As you can see, the ranks
starts from the bottom to top. I would like to reverse it. In order to do that
write and click on the axis it the Ax and
then let's reverse. That's all. Let's close this. As you can see now we have
the top rank at the top, and then the bottom we
have the lowest rank. Now in order to have
this pump effect, we have to have like circles
inside of our visual. We can do that very
easily if you, in order to have the pump
effects, we have to have lines. We have it already,
but as well we have to have circles on the data points. There is one easy way.
In order to do that, let's go to the colors and
change the markers to circles. Now as you can see, we've
got our small circles on each data point and
we get the pump effects. But now sometimes we go more advanced in these charts
where we can make our own customizations for those circles where we want
to make those circles, those data points
a little bit more bigger and inside it, the rank. Now in order to do that, let's first hide those small circles. We don't want that.
Let's go to the colors and just have a line without
markers. Now in order circles, we have to
have the same measure. Again, in our view,
let's take the sum of sales hold control and
put it on the right side. With that, we've got two
charts for each measure. Let's go to the second one, to the Sum of Sales over here. Instead of having lines, let's move it to circles. Switch the marks
here to a circle. As you can see, now we've got
very nicely those circles, and now we can go and change
the size of those circles. All right, that looks nice. Now the next step is that
we're going to go and put it on top of each other's. And we can do that
using the dual axis. Let's go to the Sum of
Sales on the right side. Right click on it,
and let's select the dual axis now that you have very nicely those
circles on top of our line. But the colors are
not correct yet because those two axes
are not synchronized. Let's go to the right side. Right click on it and
synchronize axis. Now we've got those circles
perfectly in our lines. I would like to hide
the right axis, Right click on it, and
let's hide the header. Now the next step we can go and add numbers on those circles. I'm going to stick
with the second measure on those circles. Let's go to the labels
and show label. The next step, I
would like to add those numbers inside the circle. Go to alignment over here, and then the vertical, and let's make it
to the center that we got those numbers
inside the circles. And we can go as well and change the coloring and
the fonts over here. Let's make it to white. The next step I would
like to go and change the sizing again
of those circles. So let's make it a little bit
bigger until it looks nice. All right, so that's enough. And with that, we got a really
professional pump chart and we are controlling the
size of those circles. So now we can go and very nicely check the ranks
of those countries. As you can see, France was
in the first data points, the rank number one,
then it dropped to two, then three, then back to one. And we can see the development of those sales
between countries. And we can see very
nicely that Italy is always the lowest rank in
the sales in our business. All right, so this is how we can create Pump chart Tableau.
162. Tableau | Sparkline Chart: All right, so now we're
going to learn how to create Spark line
chart in Tableau. Spark line charts
are really like compact visuals in order to show the trend that
changes over time. And you're going to
find it in a lot of dashboards in order
to show KeBIs. Now let's see how
we can create that. It's really simple. So now we're going to take
a dimension like the country and put it on the rows in order just
to split those lines smaller size. Now
in the Spark lines, it's very important to
have the information of the sales at the start and at the end of each line. Let's go and do that. Let's take the sum of sales, drag and drop it to the labels over here, holding control. So now we have the
information of sales on each quarter in
each data point. We don't want that, let's
go to the labels over here, and now let's go to
the Min and max. Let's go select Dots. Now we can see that we
have for each line, two values, the minimum
and the maximum. But here really on the sum of sales, Instead of that I would like
the min and max depends on the value of the order date.
Let's go and switch that. We can go to the field over
here instead of automatic. Let's select the quarter now. As you can see,
with that, we got exactly our spark lines. We have the starting value and the end value of each line. But now usually the spark lines are really compact visuals, they are really small lines. In order to change that, let's switch from entire
view to standard. And now we're going to go
very carefully to the end of our axis until we get
the size of our mouse. Then now let's go and completely
reduce it that we've got our compact lines I
would like as well to remove those lines
in our charts, so right click on it over
here and go to Format. And then on the left side we're
going to go to the lines. We are at the rows, I would like to
remove those rows. So make sure to
select the row tabs and removing those grade lines, we can go over here
and select none. And with that we got
really clean spark lines without any grades as well. We can go and hide those
informations about the sales. Let's go right click
on it and show header. Let's disable it. That's it.
Now I'm happy with that. We got a very nice spark
line chart in Tableau. And as you can see, there
are compact visuals in order to quickly
identify trends, which we usually
use it inside QBI.
163. Tableau | Barbell Chart: All right, so now
we're going to go more advanced on building
visualizations in Tableau. We can learn how to create
Pipa charts in Tableau. Parble charts are really
amazing in order to compare two data points and find the
differences between them. It's like before and after. And it works perfectly if you have categories
now we would like are two years 2020, 1.20 22 by the categories. So now let's start
first with taking the subcategory
in other category in order to have more values. Now next we need two measures, the first one for the year
2021 and the second for 2022. In order to do that,
we have to go and create a new calculated field. Let's go to the data again. Click over here, Create
New Calculated Field. And now I'm going to call
the first one, Sales 2021. And the form going
to be very easy, so we're going to use the F
condition if the order dates, but now we are talking about
the year of order date. So let's move it to year if the year of the
order date equals 2021. So now what can happen
if the condition is correct, we're going to show
the sales, then sales, and otherwise going to be null that sets,
Let's go and end it. Now in this calculated field, we will get the sales
only if the year is 2021. Let's go and copy it
because we need it for the next one that sets. Then hit okay. And with that, we got in the data pain in new calculated measure
for the sales 2021. Let's go and create
for the next year, it's going to be the
sales of 2022. Pace. Same calculation, but
now we're going to say if the year is 2021, then show the sales. So that's it, let's set. Okay, so with that, we got our second measure for
the sales of 2022. Now we want to compare both
of those sales in our view. Let's take the sales of
2021 to our columns. Now in the purple charts, we're going to have like
two circles and between them a line in order to
find the differences. First, let's start
with the circles. Instead of having parts, we're going to go
to the marks of a year and change it to circle. With that, we've
got, in our view, the first circle
for the year 2021. What is missing now
is the second circle. In order to do that,
we're going to go and get our sales 2022. Move it to the axis in order to generate the measure
values and measure names. Just drag and drop it over here. And now with that, we've
got our second point. The first one, the
blue one is for 2021 and the second one is 2022. All right, with that, we
have built the first part of the parble charts where we have the starting point
and the end point. Now in order to show
the differences or the distance between
those two values, we have to have a line
chart between them. So that means we need now another type of chart
inside our view. In order to do that,
we're going to go and duplicate the measure values. Hold control, drag and drop
it and just put it beside it. Now that we have the
same data on the left, the right, on the right, we're going to
have now different visual instead of circles,
we're going to have a line. Let's go to the tab over here on the marks to the second one. Now we're going to go and change the visual from circle to line. With that, we got our lines, but we are not there yet. I would like to have a
distance between two values. In order to do that,
we're going to take our measure name from the colors and we're going to
go and put it on the path. Drag and drop it on the path. And with that, we got
exactly what we want. We have now like a line
between two points. All right, so now the
final step, with that, we're going to go and merge
those two charts in one. So in order to do
that, as we learned, we're going to use
the dual axis. Let's go to the measure values over here
on the right side. Right click on it. And dual
axis, let's slick that. Now we got a perfect line
to show the distance, the difference between the starting point and
the end point. But now we still have small
issues in the visuals. I would like to make those
circles a little bit bigger. So let's switch to
the circles and go to the sides over here and
make it a little bit bigger. All right, so that's enough. Now, as you can see, the line
is on top of the circles, which is naturally correct. In order to make it in behind, we have to go and switch the
order of those dual axis. So let's take the right
and put it on the left. All right, so with
that we've got a perfect Parbal
chart in Tableau. And we can go and analyze
the differences between two data points between the
sales of 2020, 1.20, 22. And we have this
very nice line in order to indicate the
distances between them. So you can see for
example, in the envelopes, there is no change on the
sales between those two years. But if you go to the
phones over here, you can see a huge
change on the sales between those two
years and individuals, It really indicates
those informations. So that says this is
how you create and why we create parble
charts in Tableau.
164. Tableau | Rounded Bar Chart: All right, so now
we're going to go and build rounded part charts. Previously we have learned
how to build bar charts, standard ones, but
now we're going to go advanced and build
rounded part charts. And we will use lines
in order to do that. I know it sounds a
little bit strange, but let's go and build that. First we're going to
go and get, as usual, the subcategories
in order to make a, and I'm going to stick
with the entire view in order to have the
whole view over here. Then let's go and get the sum of sales to the
columns over here. So far as you can see, this is very nice standard part charts now instead of having
those classical bars, we're going to have rounded each bars at the
start and at the end. How we're going to do that,
we're going to go and have like a dummy value
average of the zero. Now we're going to do,
we're going to go and merge those two measures
in one single axis. In order to do that, let's
drag the average and put it on top of the sales over
here in order to generate the measure
values and names. So now we're going
to go and confer the bar chart to a line chart. Let's go to the marks
over here to the line. And then what we're going
to do, we're going to take the major name and
put it on the path, so now we are almost there. What we're going to do,
we're just going to go and increase the size
of those lines. Let's just make it bigger. And with that as you can see, we got rounded part
chart in Tableau. And as well we're going to
get very nice color effect if we take the major values, hold control and then drag and drop it
through the colors. And with that we got really nice rounded part
chart in Tableau. Well, if you ask about
now the use case, it's exactly like having
standard part charts. For example, here we can make a ranking list of
the subcategories. We just change the
visual off its, so that's how you can build
rounded partchart in Tableau.
165. Tableau | Slope Chart: All right guys Sona,
we're going to learn how to build slobby
charts in Tableau. Slobby charts are perfect
in order to show how the ranking is changing over time for
different categories. So let's see how we can do that. Since the ranking over time, that means we need
the order dates. So let's go and bring the
order dates to our view. Then the next step, as usual, we're going to get our measure, the sales to the rose we want to compare
the last two years. In order to do that, let's
go and filter the data show filter for the years, and let's go and select
the last two years. So now we have to decide which category you want to compare. You can go for the
border categories, we can go with the countries. Let's go and pick the country
and put it on the details. Now the next one I'm going
to go and just make it a little bit bigger in order
to compare those two years. The next step that we're
going to go and put the category or the
country on the names, let's control on the country
and drop it on the labels. Now we can see the country name on the end of each labels, but I would like to
have it as well at the start in order to
get the sloppy chart. So let's go to the labels. So now what do we
have to do is to put the labels at the line ends. So instead of having goal, let's switch it to line
ends. And let's close it. So now we can see that
each line starts with the country name and ends as
well with the country name. Now the last step
that we want to add for each line,
like small circle. In order to do that, as
we learn before we go to the colors and
we put the markers, so now we have a small circle at the start and at the
end of each line. And this is the easiest way in order to build slobby
chart in Tableau. Again, the use case
of the Slobby chart that we can see how
the ranks are changing the time in 2021, you can see France far
as a first than USA, Germany, and the last was Italy. And now we can see
the change over time. In the 2022, Germany went from place number three
to be place number one. And then France
moved to number two, USA moved to number three. And as you can see,
Italy, nothing changed. So this is the power or the
sloppy chart in order to see how ranking are
changing over the time. And of course in Tableau, we can go more advanced where we add more complicated stuff in order to have more
customizations. For example, you
say, you know what, I would like to have
bigger circles. In order to do that, we
have to have two charts, one for the line and
one for the circles. Let me show you how
we can do that. Let's take the sum of sales control and duplicated
the first one going to be the lines
and the second one is going to be the circles. Let's go and switch for the second measure
instead of automatic. We're going to select
here the circle. It's two way big for our visual. Let's go to the size over here. And just reduce it in order to have smaller circles as well. A little bit more that sets. Now what we're going to
do, we're going to bring those two charts in one. Let's go and merge it
using the dual axis. I'm going to go to the
second one over here, right click on it, and then
let's go to the dual axis. Then if you look closely, those axes are not
100% synchronized. What we're going to do, we
can right click over here and then synchronize the axis. So now we've got the circles exactly in the
place that we need. Since we have two axes that
have the same informations, I'm going to go and
hide one of them. So let's go and disable
the show header. Now you've got the full
customizations of the chart. You can say, you know
what, for the lines, I would like to
have another color. For example, let's
have a gray color. Or you might say, let's
make it a dash line, so we got the bath over
here and move it to the dash line that we get full customizations
on our chart. But usually for
the sloppy charts, we have a solid
line between that. This is how we can create
sloppy chart in Tableau.
166. Tableau | Bar & Line Charts: Okay, so now we can
learn how to combine different types of charts
in one single view. Here we're going to mix
the parts with the lines. There are different
methods on how to do that depend on the use case. The first one is using
the average line. First, let's go and build a standard bar line
over the time. In order to do that, let's
get the order dates to the columns and as well
the sales to the rows. Then let's switch the years
to a continuous month. Let's change the format now, instead of having the line, we're going to go and
switch it to bar charts. So let's go to the Marks and
switch it to pars. Great. With that, we've
got our bar chart. The second step
is to add a line. This line going to
be the average line. In order to do that in
Tableau, it's very simple. Let's go to the analytics. And here we have the
option of average line. Let's go and drop
it to our view, so it's going to be for the
whole table. And that's it. As you can see, it's very easy. With that, we got a
nice average line combined with the part charts. All right, moving on
to the next method. We're going to go and
combine the parts and lines using the dual axis. And here we're going
to go and compare two different measures. So this time as a change, we're going to go and
compare the number of orders together with
the number of customers. Now let's go and
get the order date in order to see the
changes over time. Then the next thing we're
going to go and get the order, the count of the
orders to the row. Now let's go and change the
format of the order date to months and then change
as well the chart, 2 bars that we got, our first chart, the bar chart. Let's go and get
our second measure and we're going to
have it as a lines. In order to do that, let's go to the count
of the customers. Put it near the rows
that we split it, our view to two charts. Let's go and change
the second 12 lines. We're going to go to the Marks, switch this page then. Now instead of having bars, we're going to switch to line. Now we have our two charts,
the bar chart and the. And as usual, we want to go and merge them together
in one single view. In order to do that, we're
going to use the dual axis. Let's go to the
customers right click on it and then choose dual axis. With that, as you
can see, we have a bar chart together
with a line charts, and of course, with the
dual axis we can go to the right side and
synchronize those two axes. But for now it makes no sense. Of course now we can add
more customizations. For example, for the line, we can do the markers. Let's go to the
colors over here, and let's just add
the markers to it. So that's now we
can go and start comparing the number
of orders together with the number of customers in one single view using two
different chart types.
167. Tableau | Bullet Chart: Okay, so now we're
going to build the Pollet charts in Tableau. Here we're going to combine
again parts with lines. Polite charts are really
important in order to compare the current value
with the target or compare the current year
with the previous year. Now let's go and get, as usual, our
subcategory to the rose. And now I would like to compare the current y with
the previous year. So let's take the sales of 2022 from our data pane
over here to the columns. And now let's go and
sort it by the axis, so we have like a rank and then we're going
to go and compare it to the sales of 2021. So what we're going
to do, we're going to take the 2021 to the details and then we're going to go and add
a reference line. So let's go to the axis
to the sales of 2022. Radically connect and let's
add a reference line. So now let's take it a
little bit to the right side and also to see those
reference lines. So what we're going to take,
instead of the sum of sales, 2022, we're going
to have that 2021. So let's slick thats and now we've got one line
for the average. We don't want that.
We want to have the total sales for
each subcategory. So in order to switch that, we're going to go and
say instead of peer pan, we're going to
have it peer sale. So let's switch it. So now we line for each bar,
which is great, but let's go and customize
those informations. I don't want to see any labels, so let's go to the labels
and switch it to none, and then let's go and
format those lines. We're going to go over
here and let's take, for example, the orange color. And then let's go and
change the transparency to 100% to have a full line. And then let's go
and make it more heavy in order to see the lines. I'm just going to
go with the full. That's it. Let's go and
close this as you can see. With that, we've got very easily a pullet chart in
Tableau where you can compare the current year of the parts with the lines
of the previous year. This is how we can create
a very nice pullet chart by combining parts and lines.
168. Tableau | Lollipop Chart: All right, so now we're
going to learn how to create a lollipop
chart in Tablo. There are two types of darts,
horizontal and vertical. We can use this
type of charts by combining the pars and circles. It's like a stick. And at the end we
have big circle. And we use the
circle in order to highlight a data value.
Let's go and create that. It's very simple. Let's take the subcategories to the rows. Then our measure going to
be the sales as usual. Let's put it on the columns so that we have already
our bar charts. If not, then go to the
marks and change it. Let's go and sort it in
order to have a rank. Since it's lollipop,
we have sticks, so let's have smaller bars. Let's go to the size over here
and just reduce the size. Now what is missing in the
lollipop is the end circle. In order to make another
chart, what we're going to do, we can take the sum of sales
as well and duplicate it. Hold control, and
just drag and drop the sum of sales that we've
got, our two measures. And what we're going to do next, we're going to go and
change it two circles. Let's go to the marks, to the second sum of sales. And instead of Automatic, we're going to have the circles. Now we've got very nicely those circles, but
they are really small. Let's go and make it
bigger. Little bit smaller. All right, maybe this is fine. What is the next step
in order to merge two together in one single view? As usual, we're going
to use the dual axis. Let's go to the second
Sum of Sales, right? It click on it. And then
let's go to the dual axis. So as you can see,
things got destroyed. We don't have any
more of the bars, and that's because in the first measure of
the sum of sales, we didn't specify for Tableau, that is bar, it
was an automatic. And with Tableau going
to go and make guesses on the suitable visual
for the current data, which is something
that is wrong. So what we're going to
do, we're going to go to the first measure and say for Tableau, it's not automatic. We want it always to be as
a bar. Let's switch it. As you can see, we have already the shape of the lollipop. We have to do some few stuff
that is not a big deal. We forgot about
synchronizing the axis. Let's go to the second one. Right click on it,
and let's synchronize it just to make sure that
everything matches correctly. Now I have those two axes that have exactly the
same information, so I'm just going to
go to one of them and hide those informations in
order to have it only once. Now the key thing of
the lollipop is that to show information at the end, at the circle here, we can put anything like
any imager, for example. We can have the total sales or the total number
of orders, and so on. But in this example,
I would like to have the text of the subcategory
on those circles. How we're going to
do that? We're going to go to the circle over here. We're going to put
in the labels, the subcategory
byhldect control, and putting the
subcategories on the labels. Now as you can see, we
have now the headers, informations on those circles. What we can do, we can go and now and hide those informations. Right click and show header. With that, we have removed
those informations and we have now the header informations or the subcategories
on the circles. One more thing that we can do, we can go and add coloring. Let's take the sum of sales
and put it on the colors that we have a really
nice rank chart for the subcategories. Okay, now let's see
quickly, the second type, we can have a vertical
lollipop charts. I just duplicated
the previous one. All what we're going to
do, we're going to go to the Quick menu over here. And switch everything between
the rows and the columns. All right, so now we have
everything vertical, but we have really big circles.
Let's go and change that. Let's go to the
second sum of sales, and let's try to reduce
stuff over here. We can reduce as
well the sticks. Let's go to the first sum of
sales to the size as well. Let's try to reduce
the sticks now. It looks really nice,
but still we have a problem with the labels. Let's go again to the
circles, go to the labels, and we're going to change the alignments
from Automatic to, so we're going to go
and change the charts. So now we have the labels
on top of those circles, but still we don't
have all the labels because the size of the
text is really big. So let's go to the
fonts over here. Changes 10-81, of
them is missing. You can go and reduce
the size of the circles. That's it. This is how you can create lollipop
charts in Tableau. And here you can see
the power of Tableau. We can go and combine different type of charts
in one single view, like here we are combining
the circle with the bars. That means we have endless
amount of combinations. And this opens the
innovations in Tableau where you can create
amazing charts and visuals. And this is exactly
the magic of Tableau.
169. Tableau | Area Charts: All right, so now
we're going to talk about the area
charts in Tableau. They are like the line charts. We can use it in
order to see how the data are changing
over the time, but under the line we're
going to get a field area in order to make it easier
to visualize those numbers. So now we're going to start with a very basic area
chart in Tableau. Since it is changed over time, we're going to get the
order date to our view and then as usual we're going to get the sum of sales to the. And instead of a
year, we're going to switch it to month continuous. Now here we have it as a
line because it's automatic. If you go over
here to the marks, you can see we have a
chart type called area. Let's go and switch
it. So this is the most basic area charts
that you have in Tableau. Okay, so now we might
say, you know what, the basic area chart
in Tableau don't have a line and usually the
area chart has a line. And between the
line and the axis, we have like a field gap. But the basic area chart in Tableau don't have this visual. In order to recreate this
design, what we're going to do, we can go and create a line
on top of our area charts. So here we can have
two types of charts, the line and the area. So let's go and create that. We're going to take
the sum of sales and duplicate it by
holding control. So now we have our two charts. The first one going to
stay as an area chart, the second one going
to be a line chart. Let's go to the second one of the sum of sales
instead of area, we're going to have a line. I think you already
know the next step. We have to go and merge those two charts in
one single view. How we're going to do
that using the dual axis. Let's go to the
second Sum of Sales, right click on it, and
let's choose dual axis. Now the next step,
we're going to go to the area chart and just
reduce the obesity. Let's go to the colors. Now let's go and just
reduce the obesity. And with that,
we're going to get a perfect area chart in Tableau where you have a line between
the line and the axis, You have a field gap, way better than the basic
area chart in Tableau. All right, moving
on to the next one, we're going to have the
stacked area charts. It's lack the part charts. We can add more informations to our visualization by adding
the dimensions to the colors. Now we have the basic area
chart at the start where we have the sum of cells and
the month over the time. Now we're going to go
and add a dimension. Let's take the
category and put it to the colors that we got. Three area charts stacked
on top of each other, because inside these dimensions,
we have three values. What we can do over
here about the design, we can go to the colors over here and increase the opacity, Really, that says,
this is how we can create a chart in Tableau. All right, next we're
going to go and build full 100% stack charts here if the total of the
sales is not important. But what is important
is to go and compare those different
categories together. We can go and use the
full stack charts. Let's see how we can do that. We're going to go to
the Sum of Sales, and we can switch to Quick Table Calculations,
Percent of Total. Let's go and click on that. We are not there
yet. As you can see. We have the percentage over
here on the left side. We want to have it 0-100
In order to do that, we're going to go again
to the Sum of Sales. Right click on it and let's edit the table calculations
we're going to do. We're going to switch it
to specific dimension. And this dimension is
going to be the category. Let's deselect the
months of order, age. Let's go and close
it. With that, you can see the Regi now start 0-100 and you have
it like one block. Now we can go and very easily compare the three
different categories. Here we can see very clearly how each category is
relating to the whole, to the total sales
of each month. This is how we can
create very easily a full or 100% stack
chart in Tableau. All right, so now we're
going to go and create small multiple area charts by
adding multiple dimensions. Let's go and get the
first dimension. It's going to be the
country to the columns. Let's go and get
the order dates as well to the columns.
And then to the rows. We're going to go and
get the categories. Those are our three dimensions. And then I'm going
to go switches from standard to entire view. Now let's go and get the
numbers inside our view. So it's going to be
the sum of sales, Let's put it in the
rows as a default. Tableau going to
show it as lines. Let's go and switch
it to areas to the marks that we get our
mini area charts in Tableau. But now let's add more details where we want to see the months. So let's go to the year over
year and change the format to continuous month.
So let's switch it. And then next we're going
to go and add the coloring. So let's control and drag and drop the
country to the colors. And in such a visualizations, it makes no sense to have
those grid information. So right click on it. Let's go to the
formats, to the lines, make sure to select the rows and then the grid line over
here and make it none. What we have created a small multiple area
charts in Tableau. It's very similar to the
lines or to the bars.
170. Tableau | Scatter Plots: Okay, so now we're going
to learn how to create the scatter plots in Tableau
cutter plots are one of the fundamental
charts in order to understand the relationship between two continuous measures. That means the main task
of the scatter plots is to find correlations
between two continuous fields. Another task of the
scatter plot is to find the outliners inside your data. Let's go now and create a very basic scatter plots in Tableau. And as I said, we need two
measures in order to do that, our two measures are going to be the sales and the profit. Let's get the sales
to the columns and as well the profit to the rows
that we got, our two axis. And it going to represents
a two dimensional graph. Now what is missing is
of course our data, the data points here. We're going to go
with the customer ID. Let's take the
customer ID and now we're going to go and
put it to the details. And here is the power of Tableau compared to any other tools
where Tableau going to go and plot all data
points that we have inside our data without
any restrictions, so that we can see
the correlation between the sales
and the profit. And as well to find
the outliners, for example, those points that
we have it as standalone. All right, So that
we have created the very basic scatter
plots in tableau. All right, And add more stuff to the design
of the scatter blots, where we're going to
change the colors, the size, add
circles, and so on. So now we're going
to go and change the size of each data point, but it's going to depend
on a third measure, the count of orders. Now let's go to the order counts and drag and drop
it to the size. Each customer is going to have different sizes
and that's going to depend on how many orders
did this customers place. This is one thing that we can
add to our scatter blots. Another thing we
can add coloring. Here we have different twins
on how to add coloring. Either we're going
to add a dimension or we can make a cluster. Now for example,
let's go and get the dimension country and
place it on the colors, the data points we can add as well different
shapes in our visual. Currently we have the
circle for everything. We can take the country, drag and drop it to the shapes. Now we can see in
the scatter blot, not only that the countries
has different colors, but they have a
different shapes. But what we usually see
in the scatter blots, that each data point can be represented as a filled circle. That means we're going to
go and change the visual. Let's go to the marks over here. And then change it from
shapes to circles. Now as you can see,
we have everything as a filled circle, but we are not there yet. Let's go and make the
size a little bit bigger. Now, what do we have over here? We have a lot of points. And what we usually do, we go and reduce opacity
of the colors. Let's go to the
colors over here, and let's just reduce it. And with that, you
can see very nicely. For example, those two points there is like overlapping
between them. One more thing that we
can add to those circles. We can have a line
border for each circle. In order to do that,
we're going to go again to the colors, and here we have
an effect called border instead of automatic. Let's have something like
this color of the gray. With that you can see we have a very nice border
for each data point. All right, so those are
some different options on how to customize
the scatter plots.
171. Tableau | Dot Plot: Okay, so now we're
going to create the dot blot in Tableau. Dot blot is one dimensional
graph in order to see the distribution of your data between
different categories. And each dot can be
representing one data point. Now let's go and see the
sales by the order date. And then we can have the
order ID as a detail. We're going to take the
order date to our rows. So now we're going to go
and see the distribution of order ID's by the date. Let's take the order date
to the rows this time. And let's go and change it
to a month as a continuous. Then we're going to go and get our measure to the columns. Now as a default, we
have it as a line. Instead of that, we're going to go and make it as circles. Now we are not there yet. We have to add more
details to the view and that by moving the order
ID to the details. Now since we have a lot of
orders inside our data sets, Tablo can ask us, do you
really want to do that? Well, yes, add all members. Now as you can see, we
have a very nice dot plot. We can add more informations. Like for example, let's take category and put it to
the colors as well. Since there are like
a lot of overlapping, we can go to the colors
and reduce the opacity. So now, with that,
each data point, each circle can
represent one order. And you can see now very
clearly and very fast, which orders has the most sales. This is how you can create
dot plot in Tableau.
172. Tableau | Circle Timeline: All right, so now we're
going to learn how to build circle or
Pubble time line. We usually use the
circle time line in order to analyze
the changes over time. And we usually use it to show the distinct values of different circles across
multiple categories. So let's see how
we can build that. Since we say it is change
over time, we need a date. So let's go and get the
order dates to the columns. We need one more dimension. Let's take, for example, the
subcategories to the rows, and then we need our measure. It's going to be the sales. But now instead of dropping it to the columns
or to the rows, we're going to drop
it on the size. Since each data point
has different size, table going to show
it as squares, let's go and switch
it to circles. Now in order to have more
data points in our view, we're going to go and
switch to the years. Let's take, for example, the quarter as continuous.
Let's click on that. Now I'm going to go and
change the size of our view. I'm just going to
go to the header and make it a little bit bigger. Then we're going
to go to the axis and just make it a little bit smaller in order to have some overlapping.
Now let's go to the and increase the size or make
it a little bit smaller. And then we can go to the
colors and reduce the opacity. And now we can add more
customizations about the design. For example, let's
take the sum of sales and put it to the colors. And then let's increase a little bit of the opacity
so it looks better. And as well depend
on how you like it. Maybe you can go and
add some borders, so let's go to the
borders over here. I like the dark ones,
so maybe I'm just going to go and make it
more gray course here. You can go and customize
different stuff. For example, you can go
and use two measures. For example, instead of having the sum of sales on the colors, we can go and get
the sum of profit. So let's go and get the sum
of profit on the coloring. So now we can see
in this one chart, we can see a lot of stuff change over time. We can see as well the
coloration between two measures in order to understand the
relationship between them. Where the side is
going to indicate the sales and the color is
going to indicate the profits. This is really powerful
and very great analyzed in Tableau using
the circle time line.
173. Tableau | Pie & Donut Charts: All right, so now
we're going to talk about the pie chart in Tableau. It is very easy and
common way in order to analyze or show the
part to hold data. Let's we can build
that on Tableau. There is like an easy way or sheeting way in
order to do that. If you go to the Show
Me over here and then click on the pie
charts, We will not do that. We will create it on our own so that we understand
how Tableau works. Let's not take the shortcuts. I'm just going to close it in order to build a pie
chart in Tableau. First, let's go to
the marks over here, Change it from
Automatic to a Pi. With that, we get a
small icon called Angle. And here we're going to go and drop our fields on top of it. In this example, we're going
to build a pie chart from the seals and then split
it by the country. Let's take the seals and
put it on the angle. With that, we've
got our fare chart. It is like a circle and
it's not divided yet. Let's switch from
standard to entire view in order to get a
bigger pie chart. Then the next step
we're going to go and divide the pie charts
into sections. So our dimension going
to be the country. Let's decode the customers, then grab the country
and let's put it on the colors so that our pi is divided to
multiple sections. And the size of each section can indicate the sales
of the country. And this type of charts
is used in order to analyze the part to whole. For example, here
we can analyze how the USA is contributing or relating to the
whole of sales. As you can see, it's
really easy to build and very commonly used
in many dashboards. We can go over
here, for example, and add some labels and change the design of course,
of these pie charts. And one more thing that I
would like to show you, that sometimes in the dashboards
you can see that there are multiple pie charts in
one dashboards in one view. In order to do
that, you just grab any dimensions and put it to
the rows or to the columns, for example, let's take that category and let's
put it on the columns. And with that, we
got immediately three part charts under those
three different categories. This is how we usually
deal with the pie charts. We have one dimension
that split the pie charts and another one that is
duplicating those pie charts. All right guys, so that's all for the pie charts in Tableau. Okay, so now moving
on to the next one, we have the donut charts. Donut chart is very
similar to the pie chart. You still have this
analysis of part to whole. You have a circle and you
have different segments. But many people prefer to use
the donut chart and that's because we can add an extra
information to the circle. All right, so now in order to build it, we need two charts. The first one going to
be the pie charts and the second one going to be the
empty space in the middle. So let's start with
the pie charts. As we learned previously, we have to switch the
Automatic to a pie chart. Then we take our measure. It's going to be the sum
of sales to the angle. And then next we're going
to take the divider. It can be the country
to the colors. And with that we
got our pie charts. Okay, so now next I'm going to switch from standard
to entire view. This is for the first chart. Now in order to get the
empty circle in the middle, we have to create another
chart inside this view. So now we're going to go and
create our empty measure, just to have a second chart. In order to do that, let's
go to the columns over here. A right average of zero. So now we still on the marks,
we have only one visual. In order to get a second one, we will go and duplicate it. Now with that, we've
got our two measures, one for the pie chart, and the second one can be
for the empty space. So now what we're going
to do, we're going to go and merge those stuff together in one place because we have
to have only one doughnuts. So right click on the average and let's go to the dual axis. And as usual, we're going to
go and synchronize stuff. So let's go and
synchronize the axis. And now let's go and
get rid of them. We don't want them,
so show header away and as well
from the bottom. So now we have the two
charts in one place. It's a little bit
small, so let's go and make things a
little bit bigger. So let's go to the sizes and just make it bigger
in the middle. All right, so now let's go and make the empty space
in the middle. So let's switch to the
second marked over here. And now the second chart.
It will not be a pi, it's going to be like a circle. So let's go and switch
it to a circle. Let's get rid of all
those informations. Now if you check
our view, we don't see the pie charts
and that's because we have overlapping and the pie
chart is behind our circle. Now in order to show it's
what we're going to do, we're going to go to the circle. Go to the size. And
now let's go and start reducing sides of the circle. And as you can see, now we are getting the shape of donuts, but our donut should, has in the middle a white color. Let's go and change the circle
color to white, perfect. Now we've got the donut
shapes in our view. But now let's go and get
rid of all those lines. Right click over here and the
empty space go to format. Then let's go to the left side. Let's start with the lines
over here, the zero line. Let's go and switch to none. Then we still have on the
column, one more line. Let's switch to the columns
instead of the grid line. Let's move it to none. Then in order to get
rid of those borders, let's switch to the borders. Then let's go to
the row divider. Make it none as well. For the column
divider, it's none. And with that, we got very
clean donut shapes in Tableau. Now let's add some labels and some data to our donut charts. Let's go to the pie chart first. Here we're going to get the informations
of those sections. So what we're going to
do, we're going to bring, for example, the country
to the labels as well. We can go and get the
sum of sales like Hold Control and Drug and
Tribute to the labels as well. Now we can go and
change the font format. Of course, if we go
to the labels over here and then click
on the three dots, then let's make, for example, the sum of sales
bowls. And that's it. So far, there is nothing new
compared to the pie charts. We are just showing the
informations of each section. But now here comes the
power of the donut charts. We can give an information
here inside the site circle. And it can be usually the total of the measure,
the total sales. Now let's go and switch
to the circle over here. Let's go and get the sum of sales and put it to the labels. Now you can see the
sum of sales here, strangely on the right side, because we didn't
customize it yet. So let's go to the labels
and then let's go to the alignment over here and make it everything
to the middle. With that, as you
can see, we got the total sales in the middle. Let's go and customize
the text a little bit. So let's go inside. So now what we can do, we can write the total
sales at the start. Then we can make
everything like pulled for the real number,
the real values. Let's make everything
a little bit bigger, 16 and click okay. Now as you can see, we've got now another information
to the par charts, where we have the total sum
of sales in the middle. And then we can see very nicely the different sections
around this number. That said, this is
how you can create donut charts in Tableau. And this type of chart, it
is like way more used than the pie chart since you can add one extra information
in the middle.
174. Tableau | Heat & Treemap Charts: Okay, so now we have another
chart in order to analyze the part to whole
using the three map. We usually work with the
three maps in order to show the hierarchical
data inside our dataset. Let's see how we can build that. Let's first start
with the marks. Let's go and switch
it to squares. The next step, we're
going to go to the sales, and we can put it on the size. With that, we got
one blue square for the total sales
inside our data. Now of course, we
want to go and split this square to
multiple informations. And here we can work with the
hierarchy of the products. Let's start with the first
dimension, the category. Let's strike and drop
it to the colors. As you can see, we already
got now a three map. The colors of the three map
is decided from the category, and the size of those blocks can be decided from the sales. Now, of course, in
this three map, we want to represent
the hierarchy. The next dimension is going
to be the subcategory. But this time we will not
move it to the colors, we will move it to the
details. Let's go and do that. Now, as you can see, each of
those blocks are divided to more blocks where we have the
subcategory informations. That means the data will
keep splitting in the tree map the more dimensions we
add from the hierarchy. For example, let's go and grab the product name and let's
put it to the details. Now we can see that
we have a lot of mini blocks that represent
the product name. With that, we have
represented our hierarchy of the product individual
in a tree map. And we can see that
each category, for example the red
is split it into multiple subcategories
and each subcategory is splitted for the
more two products. But of course, the disadvantage here that the more details you add harder going to be
to read this visualization. I don't recommend you to
go with the product name. In such visualizations,
it should be enough with the category
and the subcategory. Of course, like any other
charts in our visualizations, we can have multiple
tree maps in one view by adding a dimension to
either columns or rows. Like for example, let's go and get the order date to the rows. And thus, we got multiple tree maps
splitted by the years, which is really useless to
have such a visualization. So let's go and remove it. Okay, so we're going
to the heat map. It is like a matrix where
you have colors inside it. And we usually use
it in order to do colorations between
two categories. Let's see how we can build that. We need two categories, that means we need
two dimensions. Let's say the first one
going to be the country. Let's drag and drop
it to the columns. And then the second
dimension is going to be, for example, the subcategory. Let's drag and drop
it to the roads. And with that, we
got our matrix. Let's switch to entire view. We have roads, we have columns. Now what is missing, of course, is our measure the data. Now in order to create the
effect of the heat map, we're going to take the sum of sales and let's put
it to the colors. Now with that, we've
got our heat map. And we can see from the colors the coloration
between countries and the subcategories where
we can see immediately that the highest seals where
we have the dark color. So for example, we have high
seals from the country, France and as well from
the subcategory phones. And the lowest
sales, we can see it for example here
in the envelopes and Italy where here we can see again the power of
visualizations, where we can read now the trends and the
colorations between our data, which is way better than
having only numbers. But of course, if
you want to add some numbers in this matrix, we can go to the labels
over here show marks. And if you want to
make it to the middle, let's go to the alignments and let's make everything
in the middle. That's it. As you can see,
it's resemble and this is how we can create
heat map in Tableau.
175. Tableau | Bubble Charts: Bubble chart in Tableau. They are really great
way in order to add a lot of dimensions and
measures in one single view. Bubble charts are
like circles and we can define a lot of
stuff in the circle, like the colors, the size, we can put inside it, text.
Let's have an example. We're going to start
with the mark. So instead of automatic, let's go and switch
it to circles. Since the bubbles are circles, let's start with the
face information. We're going to go and
get the measure cells. Let's put it on the size. With that, we got our
small Pubble or Circle. Let me switch it to entire view. Now we have one information, the total sales inside our data. Let's add another
information like dimension. So let's go and add the
subcategories inside our view. So I'm going to
take this dimension and let's put it on the details. So now as you can see, we
got more pubbles and we're going to get a bubble for
each subcategory now. All right, so now
let's keep adding more informations
to our bubbles. Let's say that I would like to add the coloring for the Pubble, and this should come
from another measure. Let's take the profits and
let's put it to the colors. So now with that, we've
got different colors. Depends on the values
from the profits. And now, how about to add one more information
inside those bubbles? Let's say the category. Let's go and get the
dimension category. And now let's put
it on the labels. Now we can see the category of each bubble, of
each subcategory. Now, as you can see, we have
four different informations that we have inside our bubble. The first one is the colors of the bubbles indicates
the profits. And then the size of the bubbles show us the sales informations. And then the number
of those bubbles are decided from
the subcategory. We have all those
subcategories inside our data. And finally, the text inside the bubble comes
from the category. This is the power of
the bubble charts where you find a lot on for
formations in one view. So now we have another fun one called stacked
Pubble Charts. Here we're going to add a lot of dimensions in the details. So let's see how
we can build that. Let's go to Automatic as usual. Then switch it to circles. Let's take the sum of
sales and put it on the size we are just
creating. Again, our pubbles. This time we're
going to go and get the country and let's
put it to the colors. So far we have those four
colors for four countries. Now if we bring any
dimensions to the details, it's going to split
this pupbles to more small pubbles that's depend on the cardinality
of the dimension. For example, let's
take the category, it has very small cardinality. And with that we will
get just few pubbles if you go and remove it. Let's take the subcategory. Now as you can see,
we are getting way more pupples
than the category, and that's because
we have more data inside the subcategory. Now let's go with
higher cardinality. So let's just remove
the subcategories, and let's get, for example,
the broad act name. Once you do it, you
will get a lot of small pupbles and they
are all stacked together. And of course, you
can go and sort the pubbles differently. If you go to the
country over here, right click on it and
let's go to sorts. Let me just move it
to the left side a little bit, change the sort. As you can see, the color
is going to change as well. So here you can go and sort
the Pubble as you want. Now of course we can
go with more details. If we take the lowest
level of details, the order ID, let's drop the product name away and
let's go and get the order ID. And with that can ask us, do you really want
all of those data? Yes, add all members. Now you will get for each order a small Pubble inside
our visualizations. Okay, So this is another
way on how to represent your data in visuals using
the stack double chart. But if you look at it, you will find it's
looked like the son. All right, so that's all for
the stacked bubble charts.
176. Tableau | Maps: Okay, so now we're going to
talk about Tableau Maps. First, let's get the data
in order to plot the maps, let's go and create
a third data source. I am at a data source page. Let's go over here on this
small icon, new data source. And then let's go
to the text file and then to the data
that we download it. Let's go to the big folder. And then we have over
here, USA Sales. Let's select this CSV
file and click Open. It's really simple table
where we have the orders, country, region, state
and sales that sets. Let's go back to
our view and let's create now a very
basic map in Tableau. Again, we can go and
sheet using the show me, but we're going to go and
create it from scratch. Now if you have a
look to, you can find that we have two
automatically generated fields, the latitude and the longitude. They are geographical
coordinates in order to plot
the map, the Earth. The latitude is responsible
to plot the horizontal lines, and the longitude is responsible to blot
the vertical lines. What you can do, get and go,
and use them to the columns. Let's take the longitude to the columns and the
latitude to the rows. With that, you can
see that Tableau is now able to plot the Earth. Now next we have to specify
for Tableau the country, the states, those
geographical informations. Let's take, for example, the country to the details. And with that, you
can see that Tableau is now focusing only on the United States
because we have only information about SA. Now let's take the States as well and boot it to the details. Now as you can see,
Tableau is focusing now with those points
on each states. All right, so now the next step, instead of having circles, I would like to
have a map chart. Let's go to the Marks. Switch it from Automatic to map. And with that we
have the whole area covered with the colors. Now you can go and add coloring depends on the
dimension that you want. For example, we can
go to the region over here and boot
it to the colors. Now we can see that the map is now splitted by the regions. Now what is missing here
is the sales informations. Let's go and get the sales. But see we have
small problem that the sales is dimension and discrete because
of the data type. Let's go and switch it to a number hole and then
make it continuous, or convert it to continuous. Then the last thing, we
have to convert it as well to a measure because
it's still has a dimension. So everything is
fine. Let's go and get the sales to the labels. And with that, we
got very nicely the total sales for each state. This is how you can create a
very basic map in Tableau. Okay, moving on to the next one. We can create maps in
Tableau with simples. I just duplicated
the previous one. Let's go and switch
the visual from map to, for example, circles. And then the size of the circle going to be decided
from the sales. Let's take the Sales
and put it to the size. Then the next Sable,
let's go and make the circles a little bit bigger. Now we can add another
measure to the circles. Let's say the number of orders we're going
to take over here, the count of the USA sales V. Let's take it to the colors. Now, the scale of the color
going to define the number of orders and the size of the circle can be
defined from the sales. This is one way in
how to represent those informations as
the circles or bubbles. We can go and choose
different shapes. Let's go over here in the marks and go to the shapes you can go. For example, let's say what we're going to have over here.
Let's go with the stars. As you can see, we have
here a lot of options on which symbol can be
presented inside our map. This is how we can add symbols
to the maps in Tableau. All right guys, Maps in Tableau are very rich in
the customizations. There are a lot of options on how to blot the
maps in the view. I'm going to show you
few possibilities on how to blow the
maps in Tableau. The first one is
about how to have a map without any
background noises. Now let's go and do
that. If you take the country field and just
throw it here in the middle. Can understand we are talking about map and we're going to get automatically everything inside
the columns and the rows. Now the next table, let's take as usual the states over here, and then we're going
to go and color it with the region
on the colors. So if you check the map, you
can see there are a lot of grade out areas inside the map
that is not used directly. If you want to remove
all those informations, what we're going to do, we're going to go
to the main menu. You have here Maps options, and then here we have
a background layers. Let's go and click
on that. And then here on the left side we will get many options on how
to customize the maps. I really recommend you
to go and click around. It's really fun to
Worcus maps in Tableau. Now the task is to remove all those background
informations. What we're going to
do, we will just remove all those
selected informations. Let's just remove
everything with that. As you can see, we have removed the background and we have only the relevant
information inside our view. There's another way on how
to remove the background. Let me just go back with
all those settings. I think with that we got
all informations back. Another way to remove
the background informations to go to the wash out and move it 0-100
Now as you can see, the background inside
our map did disappear. This is how we can remove the background informations
inside our map and you get really a clean map in order to focus on
the relevant data. Okay, the next one is as well, about customizing
the maps in Tableau. So now let's go and create
a night vision map. It is just fun to work
with maps in Tableau. So let's go again and get the countries in
the middle of the To the details. Now in Tableau, we have different types
of maps, not only one. If you go to the main menu
over here to the maps, either you check
the background map. So here we have the
different modes. Or if you go again to
the background layers and on the left side, you can see here the styles. Currently it is white
and gray, it's lights. If you click over here, you can find the different models. We have the normal one and then we have stuff like dark street, outdoors, and satellite
informations. It's really nice to
have different styles. What we're going to do now,
since it's night vision, we're going to go
with the dark modes. Now the next thing I
would like to reduce some informations like
United States and Mexico. Let's go and remove those
stuff from the left side. What we're going to do,
we're going to go and add some measure to our view. Let's close the background
layers over here. Let's go and get the sales to the size that we are
getting, those nice circles. Let's make it a
little bit bigger, then we can add the sales
as well to the colors. So hold control, voted on the colors and let's
change the coloring. So let's go and edit colors. Now let's go to the
automatic over here. And let's change it
to another pattern. For example, let's take the blue green over
here. Click Okay. Okay. Now we're going to go and add more customizations
to our map. For example, let's say
that I would like to change the color of the
borders for those states. I would like to make it red in order to make it
more interesting. I cannot do that in
the current view because if I change
anything about the border, it's going to change
the border of the circles and not the
border of the states. In order to do that,
we need two maps, one for the circles and
one for the states. All right, now let's
see how we can do that. We're going to go
to the longitude and we're going to
go and duplicate it. Now that we've got two maps, the left and the right, let's go and configure the right one. Let's switch the marks
to the second map. Now instead of having circles, we want to have a map. Let's switch it to a map. Now, as you can see now we have two different types of maps. But now I would like to have
only the border information, so I'm not interested
about the sale. So let's go and remove it. And as well for the sizing. Now as you can see,
we have gray colors that is filling the map. So let's go to the
colors and reduce the opacity to 0% so that we don't have
any colors on the map. What do we need is the
color of the border. So let's go again to the colors. Let's go to the
borders over here. Let's make a read. I'm not
really happy with this color. I want it to be more red. So let's go to more colors
and let's get the re red. Now the question is how to merge those two maps in one map? Well, the answer for that
using the dual axis again. So let's go to the right one over here, right
click on it and dual access. All right, so with that we got to one map, but I'm still not. That tab, you can see
that the circles are behind the lines in order
to have it in the front. Let's go and switch
those two measures. And now you can see that the
circles are in the fronts. All right, so with that we have created our night vision map. And with that you
have learned as well how many possibilities
that we have in Tableau In order to
customize the maps, all those different options
that we have inside the maps, I really recommend you to go and explore those options that we have inside Tableau.
It's really fun.
177. Tableau | Histograms: Okay, now we're going
to learn how to create histograms in Tableau. There is two ways, one quick
way and one advanced way. The quick way if you
have one measure, the advanced way if
you have two measures, the histograms are really
great way in order to show the distribution of your
data using power charts. So let's see how we can do that. Let's work with the one measure, the quantity, right click on
it and then go to Create. And then two pens. Here we can go and
configure our pens. I'm going to leave it as
default as Tableau suggests. Let's go and click Okay. With that, we have
created a new, been new dimension
in our data pain. Now what we can do,
we're going to go and grab it to the columns, and here we can find
the size of our pens. And then we're going to go and get the quantity to the rows. And then the next and
the last tap can do. We're going to go to the
quantity and convert it from discrete to
continuous radical. Click on it and switch
it to continuous. So with that, we have created a very simple and
nice histogram to see the distribution of our data
using the measured quantity. All right, the next one is going to be a little bit
more advanced, where we're going to
create a histogram using two different measures. The number of customers by the number of orders we want to cluster our customers based on the number of orders
that they placed. Now in order to do that, we
have to create our pens, but now we're going to use the calculated field
in order to do that using the LOD expressions
fixed. We can do that. Let's go and create a
new calculated fields. Let me just move it a
little bit over here. What we're going to
find out is the number of orders per customers. In order to do that, we can
use the LOD function fixed. It's start with fixed,
let me select that. Then for each customers, we want to count the number
of orders for customers. We're going to get
the customer ID. And then the
aggregation going to be the number of orders. That means we're going to go and count the order ID. All
right, so that's it. Let's go and hit, okay, that Tableau did create
a continuous measure, but I would like
to convert it to a discrete dimension Rat, click on it and let's
convert it to dimension. And that's it. Now
let's go and grab it to our view and
check the informations. All right, so that
we can see that we have already our pens and those are the different number of orders that the
customers did order. The next step we need
our second measure. It's going to be the
number of customers. Let's go to the customer's
count over here, drag and drop it to
the rows as well. Let's take the customers
to the labels. And with that, we got
a very nice histogram in Tableau using two measures. Again, here, if
you want to build histogram from two
different measures, one of those measures
has to be the basics, the pens of the
histogram and the second measure going to be used
in order to do the counts. So now we can see very quickly that most of our customers are ordering between 13 orders
and like 16 orders. All right. So those
are the two methods on how to create histograms, the easy way and a little
bit complicated way.
178. Tableau | Calendar Chart: Okay, so now we're
going to learn how to create calendar in Tableau. So now we're going
to go and build this calendar using
the order date. Let's take the order date
first to the columns. Now in the columns
we have to have the days radically connect in
order to change the format. And then go to more. And then let's get the
week day that we got, the mandate, Tuesday and so on. Then we need to build the
rows of the calendar, and it's going to
be the week number. Let's go and hold control duplicated to the rows
instead of the week day. Let's switch the formats again. Over here to the more and
then week number that we got. Our matrix, our calendar. You can see we have
here all the weeks. I would like to reduce
it to only one month. That means we're
going to go and add some filters to our view. Let's take the order date, put it on the filters. And the first filter
going to be on the years. Go and select the years. Let's select the last year, He Ok. And we can of course, go and offer it for the users. Right click over here and show the filter on
the right side. We can do the same
for the months. Let's go and take the order date and put it on the filters. Let's go for the month next. And let's select only one month. And then offer it as
well to the users. All right, with that we got of one month. Let's go and switch it from standard
to entire view. Now as usual we need a measure in order to fill our calendar. It's going to be
the sum of sales. So drag and drop it and
put it on the colors. All right, So that we
can see already that we have a heat map
inside our calendar. Now we need to just
add few stuff. For example, let's
add some white porder between those informations. Go to the colors, and then
go to the porder and add a white color so that we get nice separations
between the days. And let's add as well the
day number in each box. In order to do that, we're going to go to the order dates. Put it on the labels
over here and then here, tablet, switch it
automatically to a text. Let's go and switch
it back to square. And instead of having the years, we have to go and
format our date. So radically connect. And
let's go and select the day. And then the next step,
let's go and place those numbers of the days
on the top right corner. So let's go to the
labels alignments and let's go to
right and then top. All right, so that we got a really nice
calendar in Tableau. Of course you can go and
switch to another month, let's say for
example in February, or check another year 2021. And that, this is how you can
create calendar in Tableau.
179. Tableau | Waterfall Chart: All right, now we're
going to create in table the waterfall charts. It's very useful in order to show the flow of the process of your data and as well to show the analysis
of part to whole. Let's see how we
can create that. First, we need a dimension
like the subcategories. Let's move it to the columns.
Then we need a measure. This time, let's
take the profits track and drop it to the rows. And then let's change it from
standard to entire view. Now in order to have a
waterfall inside our view, we need the running total. In order to do that, let's
go to the Profit over here. Right click on it, and let's do a quick table calculations. And let's switch it
to Running Total. So that you can see we have now a running total of our data, but still it is not a waterfall. In order to do that, we have to switch it from the
classic parts. So let's go to the
Marks over here, to the Gant parts. All right, so that we got the
basics for our waterfall, but now the size of each line going to
depend on the profit. Let's go again and grab
the profit to the size. But now if you check it closely, we can see that those
parts are not making the waterfall because they are
in the opposite direction. We would like it to be starting from zero, from
the bottom to top. In order to make this effect, let's go to the sum
of profit over here. Double click on it, and then
let's make it as a minus. Click on that. Now, exactly.
We got what we want. It's start from the bottom to, and with that we are forming
the shape of waterfall. Now we have to add
some coloring. Let's go and get the profit. Put it on the colors. Now, what we want to
do with the colors, if the numbers are positive, then it's going to stay blue. But if it's negative,
it should be red. In order to do that,
let's go to the colors and edit colors. And now we're going to do the
following set up. So let's go over here and
make it only two steps. And then let's go to
advance over here. And make sure that
everything in the center, so it is zero over here. And that's it. So
let's go and hit. Okay. And with that, we can
see very easily where are the negative values in our waterfall and where
are the positive values. You can, of course, make
it as green and red. So now the last thing
that we have to add to our waterfall is the total. In order to do that,
it's really simple. Let's go to the Analyses
on the main menu. And then we go to the
totals over here. And let's add show
raw grand totals. By doing that, we get our
total on the right side and with that we get a perfect waterfall
charts in tableau.
180. Tableau | Pareto Charts: Now we have the Parto chart. It is very famous charts
in the statistics, and this chart is based on the Parto principle where it
used the rule of 80 20 and the principle says
80% of the outcomes generated from 20%
of work or efforts. One way to visual
the Pareto charts, we can use two different charts. The first one going to be
the part chart and the second going to be
the line charts. Let's yeah, we can
build that in Tableau. First we can start with
the dimension subcategory, drag, and drop it
to the columns. And then we need our measure. Let's check the Sid and
drop the Sales to the rows. Now in order to have
the perretta effects, we have to sort the data. Descending first, should comes the data
with the highest sales. And then we go descending
to the right sides. What we can do, we
can go to the Sales over here and sort its perfect. Now we have the Parcharts. The next step we want to do
is to build the line charts. So in order to do that,
we're going to go and get the sum of sales
and duplicated. So hold control and
duplicate these fields. And with that we've
got our two charts. So since the second chart can be a line chart, let's
go and switch it. So I'm going to switch
the Sum of sale, the second one, and
instead of Automatic, we're going to
have it as a line. And as well, I'm going to
change the color to orange. Perfect. As usual, we have to go and merge those two
charts together. So let's go to the
Sum of Sales, right? To click on it and all axis. And here our chart is broken because the first
chart is automatic. So let's go to the
first one over here and switch it back to pars. Alright, so we are not there yet because we have to
work on the line. The line should be the
percentage of the running total. In order to do that in
Tableau, it's really easy. Let's go to the Sum
of Sales over here, right click, and let's go
and add table calculation. All right, so now we're
going to go and configure our table calculations
for the second measure. And as I said here, we
have to do two things. First we have to calculate
the running total, and then we have to
apply the percentage. In order to do that,
let's go and change the calculation type
to a running total. Let's go and select
that. And with that, as you can see in
the background, we have a running total. But the principle
here is based on the percentage of
the running total. So we have to go and switch
this to a percentage in order that we can click over here and say Add a second calculation. Let's click on that. We get a primary and secondary
calculations. The first one can be
executed as a running total, and then on top of that we
want to get the percentage. Let's go and switch it
from difference from the secondary, 2% of total. Let's click on that, that's set for the table calculations. Let's go and closet with that, we have built our Pareto charts, but let's understand what
is going on over here. Now, in order to
easily read this, I'm going to go to
the second one, to the line, and let's put
the labels on top of it. And of course, the
principle says 80 20, that means 20% of those subcategories should
80% And as you can see, we cannot see that's
in this business. If you check our subcategories
in this example, you can see it's not 20% We have around nine subcategories in order to reach the
80% In this example, our business does not
follow this principle. It's 80% of the sales are covered by 20% of
the subcategories. All right? So this
is one method on how to create Pareto
chart in Tableau, and this is how you can read it. All right. So now we're going
to learn another method on how to create Pareto
chart in Tableau. This time we're
going to go and use two different measures
using only one line. Let's see how we
can do that. Now we have the business
question and it's ask us, do the 20% of the products
makes up 80% of the sales. Now let's go and get the
answer from the data. In order to do that, let's
get first our first major. It's going to be
the sum of sales. Drag and drop it to the rows. Now let's go and get
our second measure. It's going to be the
count of products. In order to do that,
let's take, for example, the product name to
the columns and table. Ask us here We have
a lot of members. Add all members. Now as you
can see, we have a dimension, but we want to count
how many products we have inside our data
so radically connect. And let's go to the measure, and then let's select
count Distinct. With that, we got
our two measures. One more thing that
we need inside the details in order to
do the calculations. We need the product name to be on the details in
order to use it. All right, so I'm
going to go over here and switch it to entire view. Let's go to the first
measure, right click on it. And let's add table calculation here again we have
the same stuff. We can switch it to
a running total. And then we're going to go and add a secondary calculation. The secondary
calculation going to be the percent of total. Well, let's specify
the dimension. Let's go and specify the
dimension to the product name. The same as well
for the right side, it's going to be
the product name. All right, so with that,
we got everything ready for the first calculation.
Let's go and close it. Now as you can see,
we have already now the percent of the running
total for the products. Let's do the same
stuff for the sales, right Click on the Sales, and then let's go and
add table calculation. Let's go to running Total. Specify the dimension going
to be the product name. Let's go and add the
secondary calculation. It's going to be the
percent of total. Then the same stuff,
we have to go to the specific dimension and
specify the product name. All right, so that
we have prepared everything for the
second calculation. Let's go and close it.
Now we have to go and switch it back to line since
we have it as automatic. So table, we decide to
go with the shapes, let's go and switch it to line. Now with that, we
are almost there. Have the running
total of pose of the measures and
we have our line, but as you can see, the line
is a little bit jittery. And that's because we
haven't sort the data yet. It's very important for
the Pareto charts that we sort the data like we
have done in the method one. Now let's go and sort their
product name by their sales. In order to do that, right click over here and go to Sort. And then we can sort
it by the sales. Let's switch it to a field. And let's go and select the Sales from the
field name over here, convert it, so let's
make it as a descending. Perfect. Now we got exactly the
Pareto chart that we need. So now we have to check
whether it's true that 20% of our products make up
80% of our sales. Now in order to check that quickly and easily in the view, we can add the support
of the reference lines. Let's go and add some
reference lines. Let's go to the
analytics over here. Let's take here a
reference line. Let's drag and drop it
first to the first value. Now we can do, instead
of having the average, let's go and switch
it to constants. Now here we're going to
check whether the 20% so it's going to be 0.2
And now with that, we're going to get
a reference line exactly on the 20%
of the products. Let's go and close
that. As you can see, we have a very nice
line, indicates exactly the 20% on the products. The next step with that,
we're going to go and add another reference
line for the sales. So let's take a reference line, drag and drop it exactly on
top of the sum of sales. And now we're going
to do the same stuff, instead of average, let's switch it to a constant, and since we need 80%
it's going to be 08. So with that, we
got exactly the 80% of the sales. So perfect. Now we have our Parto chart. And we can easily answer these
questions from our data. So we can say, yes, 20% of our products are
covering 80% of the sales, which is exactly matches
the rule of 80 20, the principle of the Parto. All right, so this is the
two methods on how to create Pareto charts in Tableau
and analyze your business.
181. Tableau | Butterfly (Tornado) Charts: All right, now we have
the butterfly chart, or we call it sometimes
the tornado charts. It is great chart
in order to analyze two different measures
by specific dimension. So for example, if you want
to compare the number of customers with the number
of orders by the category, then the butterfly
chart is your charts. What do you need
First, the dimension. It's going to be, as
usual, the subcategory. Let's move it to the rows, and then as usual, we're going
to move it as entire view. Then we need our two measures. The first one going to
be the customer count. Let's move it to the columns. Then the second one going
to be the order count. All right, so with that, we have our two measures and
the subcategory. Now in order to form the
shape of the butterfly, we have to have the dimension
exactly in the middle. And then on the right
side we have one measure, and on the left side we
can have another measure. In order to do that, we're
going to use the placeholder, the average of zero. Let's have it over here, and let's go and place it
exactly in the middle. Now with that, we have
the measure on the left, measure on the right, and
something empty in the middle. And then let's go and
configure the charts. It's going to be the middle
one, the average of zero. Let's go and switch
it to a text. And now the next thing
we have to go and get the dimension to
the text over here. And with that you can
see we've got now the spine of the butterfly. So let's go and make it
a little bit more poles. So I'm going to go over here
and just make it poles. But now we have to
have the two wings right on the right
and then the left. You can see the right side is okay, so we
have it as a wing. Let's go and sort
the data by the way. But the left wing
is not correct yet, so in order to do
that, let's go to the count of customers
over here on the axis. Let's edit the ax
and let's go and reverse the scale that we get exactly the
opposite in the scale. Let's go and close
it, and as you can see now we got it perfect. On the left side the wing of the customers and on the right
side we have the orders. Now the next step
is what we usually do is to add some coloring. For example, let's stay at the customers over here and drag holding control the count of customers to the
colors as well. We can go to the orders
over here and drag and drop the orders by holding
control to the colors. But of course, we
can go and customize the right side with using
different coloring. Let's go to the colors
over here and change the pattern maybe to
orange, let's say. Okay. As well. We
can go and make the ticks in the middle a
little bit more bigger. Let's go to the middle. And then let's make it maybe
something like 15. Now we can see
those subcategories in the middle very clearly. But since we have
it in the middle, we don't need it on the right.
So let's go and hide it. Right click on it and then let's go and disable show header. We can go to the axis over here and as well disable the headers. And of course we can add more formatting in order to
remove those grids. Right click over here on the
empty space to the format. And then we can go
to the columns tab and as well remove
the grid line. With that we've
got a clean chart, represent a butterfly
or a tornado, depending on how you see
it, where you can go and compare two different measures
by specific dimension. All right, now in
the method two, we're going to bring
those two wings together. In order to do that,
we're going to get exactly the
same information. Let's go and get
the subcategories, the rows, and then as usual, switch it to entire view. Let's go and get our measures. So the first one going to
be the counts of customers, and then the second one going
to be the counts of orders. But we have to put it now
on top of each other's. Since we are using the
same type of charts, we're going to use the measure
names and measure values. Take the order counts and drag and drop it on top
of the axis over here, in order to generate the
measure names and values. All right, so we have
those informations. Now we're going to go and
take the measure names. We don't need it on the roads, so drag and drop it to
the colors over here. And just to make sure that
everything stay as bars, I'm going to go from here and switch it from Automatic to bar. And now the next step we're going to go and store the data. So click Axis over here, and then sort the data. Descending both of the values, or the wings are
on the right side. Now in order to have the
effect of left and right, we don't have here two axes. What we're going to
do, we're going to do a very small trick
in order to do that. Let's go to the
customers over here. Double click on
it and just go to the front before the
counts and put a minus. Let's go and hit Enter. So with that, we get
again the effect of the butterfly where we have the left and the right
wings together. But of course what is
missing here is the spine, the dimension, the subcategory. In order to do that, we're
going to do the same. We're going to go
and have the average of zero as a placeholder. We have it now on
the right side. Let's switch to it, and then we can
switch it to a text, since we want to have a
text of the subcategory. And then the next
step we're going to go and get the text. It's going to come
from the subcategory, drag and drop it on
top of the text. And with that we got the values or the spine of the butterfly. The next step is that
we're going to go and merge them together
in one charts. What we're going to
do, we're going to go and use the dual axis. Right click on the average. And then here we
use the dual axis, but as you can see, those values are not yet in the middle. And that's because we haven't
synchronized the axis. Go to the average over here, and then let's select
synchronize axis. And with that we got the
spine exactly in the middle, but it's not really
clear because it's red. So let's go and
change those colors. So let's go to the
Average over here. Double click on it. And let's select Complete
White. That's it. Click Okay, And now the
next step, as usual, we're going to go and
start hiding stuff because all those informations
are not necessary. So the average over here, let's go and hide it. And that's all we don't need the header information because we have it already
in the middle. So right click over here
and disable show header. And with that we get a very elegant and nice
butterfly charts in Tableau where both
of the wings together. And now we can go and analyze
the coloration between the number of orders
and the number of customers by the category. All right, so this
is how we can create butterfly alternator charts
in Tableau using two methods.
182. Tableau | Quadrant Chart: All right, so now we're
going to go and learn how to build quadrant
charts in Tableau. This type of chart is going
to go and present a lot of data points in one view
using two measures. And then we go and compare
those different data points based on their position
on the quadrant. And then we go and
split the chart into four different quadrants. This type of chart is
really great in order to do strategic planning or
to do risk managements, or as well to find some trends. So now let's go and check in Tableau how we can build that. The first thing that we need
is two different measures. The first one going
to be, let's take the discount and put
it on the columns. Let's go and find the
average of the discount. Right click on it, and let's go to the average
instead of sum. So this is our first measure. Now we need another measure. This time going to
be the profit ratio. We don't have it in our data. Let's go and quickly create it. Create a new calculated
fields profit ratio. And it's very simple. It's going to be the
sum of profit divided by the sum of sales
that let's go and hit. Okay, then let's go and bring it to our roles that we got, our two axis, but I would like
to have it as percentage. Let's go and change the formats. Let's go first to
the profit ratio. Then instead of numbers, let's go and switch
it to percentage. Then let's go and
remove those decimals. The same thing, let's do it
for the average of discounts. So let's go and
format it as well, two percentage and
remove those decimals. All right, so that's
all for the access. What do we need now is the
customers as data points. In order to do that,
let's go and get the customer ID and let's
put it on the details. Now as you can see, each of our customers are
presented as a data point. Let's go and change
the visual of that. Instead of shapes,
let's have circles. And let's go and
reduce the opacity in order to see the overlapping between those points as well. We can go and make it
a little bit bigger. So now we need two
values in order to split this chart into
four different quardants. Now here, since we have
the titlezed dynamic, we want to offer
it to the users as parameters in order to
specify those two values. So now let's go and create two parameters in the data Pain, so we're going to
create the first one. Let's say select discount, so it's going to
stay as float and the display going to
be as a percentage. Let's reduce the decimals and then let's say that the
default going to be 0.15 so with that we're going to get 15% So that's it
for the first one. We're going to do
exactly the same for the second one in order
to get the profit ratio. So let's create another
parameter and we're going to call it select profit ratio. Have the same stuff again, so we can have it as percentage,
reduce the decimals. Let's have it as a 10% your
one. That's it for this one. Let's go and close it
and show it in our view. Show parameter and
show parameter. Now we have it on
the right side. Next, we have to create
now a separation in our view in order to show
how the data are splitted. In order to do that, we can
add two reference lines. Let's start with
the profet tertio, right click on it and
add the reference line. Then the value going to depend, of course, on our new parameter. Select profet tertio. And then let's go and
make the label empty. And then we can go and
change the format. Instead of having a line, let's have a dash one, then let's have the plaque. And then increase the opacity. And that's it. Let's okay. And do the same as
well for the discount. Right click on the discount. At the reference line, we need our parameter. Can we select discounts?
Remove the label. And as we'll do the same
stuff on the customization so we can have it as dashed and as well have it clear on our view. All right, now let's go
and hit, Okay. All right. Now as you can see,
we have already our quadrant charts where we have splitted our data
in four different sections. Of course, we can
go now and change those splitters using
the parameters. Let's got the buffer
ratio and change it to 0.2 With that we move it
to 20% Now of course, what is missing in our quardant is the colorings
of those points. Each section should
has its own colors. In order to do that, we
have to go and create another calculated field
to have those four values. Let's go and create one. Let's call it quadrant color. Now we have to go and identify the position of each data
point inside our quardants. Let me just move it a
little bit over here. In order to do that, we
can use the FL statements. Let's start first identifying the points on the upper right. All those points on
the upper right. How we're going to do
it, We say if the profit ratio to the parameter value that is selected from the users, we're going to say select
and then the profit ratio. That means we are checking
whether the user on the upper section
and now we have to check whether it's on
the left or the right. So we're going to talk
about now the discounts and the average discounts as well. Higher or equal to the
value selected from the parameter we're going to
write select and discounts. Now we are targeting all the customers on the upper right. So what can happen if the
condition is fulfilled? We're going to say, right. All right, so now we're
going to go and do the same stuff for all
other three sections. Let's go and just
copy it from here. Then we're going to say,
then let's go and paste it. Let me just make it a little bit bigger in order to see it. Now we're going to go and
target the upper left. In order to do that,
we have to go and change the discount to smaller. Now we are saying
if the discount is smaller than the selected
value in the middle, so that means we are
on the left side. What's going to happen? We will just go and flag it with the following value, upper left. Then we have to do
the same stuff for, let's say now we're going to go and target
the bottom right. Let's call it bottom right for the discount
part, it is not correct. Let's move it like
this in order to have the right section for the ratio in order
to be in the bottom, this time is going
to be smaller. With that, we are at the right
side for the last section. In order to target it, we don't have to go and specify it. We would say just simply else because if none of those
conditions are fulfilled, we will end up by the last one, we're going to call it
bottom left. That's all. Let's go and end
our FL statements and the calculation is valid. Let's go and hit
Ok. And with that we got our new calculated field. Let's go and drag and
drop it to the colors. Now as you can see, we
have a dedicated color for each different sections
inside our cordons. And of course, if
the users goes over here and change the
values of the parameters, the coloring will react as well. Since we have the parameters
inside our calculated field, for example, instead of 15, let's have it as 0.25
Now as you can see, the reference lines goes
to the right sides, to the 25% and as well, the coloring will be adjusted. That's all. This is
how you can create a very nice dynamic
Urdan chart in Tableau.
183. Tableau | Box Plot: Now we're going to talk
about the box plot. Inta, blow, or sometimes we call it box and whisker plots. This type of chart going
to help you to understand the data distributions
of your datasets. This chart has like a box and two whiskers on the
top and on the bottom. And then in the middle
we have the median and the edges of the box so that we will get five
different numbers in how our data is distributed. Let's see how we're going
to build that inta blow. It's really easy. Let's start
as usual with the sales. Let's drag and drop it to the rows and then
we're going to see how the sub of categories are
distributed on those sales. Let's take the subcategory
to the details first, and then we have to change
the visual to circles. Let's go to the marks over
here and change it to circles. Now in order to have
different charts, I would like to add the category to the columns over here. And then let's go and make it a little bit bigger to
the middle over here. Now let's go and reduce those circles a little bit in order to
have it more clear. And with that, we have the
first part of the box, blots where we have circles. Next we have to
get those numbers or the shape of the
box and the whiskers. In order to do that, we have
to add a reference line. Let's go to the cells over
here, radically connect and reference line. And here everything is prepared
from Tableau. If you go to the
Boxplot over here, and that's it, let's click okay. And that's it,
actually. With that, we got a boxplot in Tableau. Now if you go and mouse
over on the charts, you will get the five
different values. The upper we score, the lower we score the median, and so on. All right, so now
the question is how to read the boxplots? Well, there are a lot of
informations over here, but the first thing
that you can do is to compare the position of
the median of each box. If you have a look over
here, you can see that those two boxes are at
the same level, right? So they are very
similar categories. But if you check
the office Supply, that you can see the median or the box itself, it is below. Those two other boxes indicate for us that
the furniture and technology has the
same distribution, but the office supply
has a different one. Another thing that you can check is the size of the box itself. If the box is tall or the
links of the box is long, then that means the
subcategories inside this category are not really similar and they are far
away from each other's. But if you check
the office supply, you can see that
the box is shorter, so the links of this box is smaller compared
to the other two. That's going to give us the
information or the hint that the subcategories
of this category, the Office Supplies has
like a similar sales. That means if we
have a shorter box, the members of this category going to have a
similar behavior. But if you have a tall box, that's going to suggest
that the members of those informations going
to have different sales. But if we have a
big or tall box, that means the members of this category going to
have different behavior. And of course, this
type of charts gonna help us to find the outliers, especially on the upper
and on the lower whiskers. All right, so that's all about
the box plot in Tableau.
184. Tableau | KPI: Okay, so now we're
going to talk about the KPI charts, Key
Performance Indicator. We usually use it in order to analyze the performance
of our business, whether it is
succeeding or failing. All right, so now
let's go and build a KPI in order to track the performance of our sales in our business. So
let's go and do that. As usual, we're
going to go and get the subcategories to the rows. Let's take the sales as
well to see the numbers. And then the next step,
let's say that we want to check the sum of sales
for each country. So let's go and grab the
country field to the columns. And then the next
step, we have to define the core of the QBI. The rule when the sales is
going to be considered as a success and when
it's going to be considered as fail
or maybe in between. So what we have to do
is now to go and create a new calculated field in
order to define the KBI rule. Now let's go and
call it BI colors. Now by checking the
data, let's say that if the sum of sales is
higher than 50 K, then it's going to be
considered as a success. Or if we are talking
about colors, it's going to be green. We're going to work
with the FL statements, so we're going to check
whether the sum of sales is higher than 50,000
Then what can happen? We're going to say it's green. So now the next step, we have
to define the second rule. Let's say that if the sales
is between ten K and 50 K, this can be medium, or let's say orange. Let's go and build
that using LF, sum of sales less or equal 50 K, the sum of sales we are making, like a range is higher than ten K. Let me just make
it a little bit bigger. Then what can happen?
It's going to be range. All right, then we
have the third rule. If it's not in between or
not higher than 50,000 then it's going to be less or equal to ten K. So what we're
going to do at the end, we're going to say L,
it's going to be red. That's, let's end it. This is our KBI rule in order to track the
performance of the sales. Let's go and hit, okay. And with that, we
got a dimension here on the left
side, the KBI colors. Let's go and grab it and
put it on the colors. The next step,
let's go and assign the current color
table almost correct. Let's edit the colors.
The orange is orange, Red is red, but the green is blue. Let's go and switch that. And with that, we
can immediately track the performance
of the sales, where we can see immediately where we are performing good. So we can see those
green numbers or we are performing bad
by the red numbers. But if you saw any
QBI dashboard, you will see that they are
using a lot of shapes. Now, instead of those numbers, let's go and get shapes
assigned to those three values. That means we're going
to go to the marks over here and switch it to shapes. Now, things are ugly currently, so let's go and take the sum
of sales to the Details. And then we're going
to take the B color to define the shape
of our visual. So with that, we've
got different shapes for each level of our KBI. But I would like to change it. So let's go to the
shapes over here, and then let's go to the Default and then
switch it to QBI. Now we have better
icons for our BI, let's go and switch stuff. So green it's going
to be this icon, orange, it's going to be this. And then the red, it
going to be the red one. All right, so that
it, let's go and hit. Okay. And now we can go over
here and make it entire view and as well change
the size of our KBI. With that we've got a
nice KPI where we can see immediately where we are doing good and where
we are doing pads. This is how we can
create BI in Tableau.
185. Tableau | KPI & Bars: All right, so now we're going
to learn how to combine a QBI together with any
other type of charts, like for example,
the Power Charts. So now we're going
to go and build view in order to compare two years. In order to do that we're
going to get the same stuff. So let's get the
subcategories to the rows. Then here we have
the sales of 2022. Move it to the
columns over here. With that, we've got
our power charts, But I would like to move it from automatic to power in order to make everything stable and not later break in
our visualization. The next step I would like to go and add as well the coloring. Let's take the sum of sales
22 and put it in the colors. Now the next step,
let's take the 2021 as a reference
inside our view. Let's move it to Details. And then let's go to
the axis, right it. Click on it and let's
add reference line here. We would like to
have the value of 2021 for each category. So let's switch it to per cell
and then select the 2021. And then let's go
and hide the labels. This is only customizations. Then let's move it to a little bit heavier line and then increase
opacity as well. Change it to orange. That's it. Let's go and hit okay. Now in order to see
the data better, let's switch it from
standard to entire view. And with that we got a reference
from the previous year, and the parts are
the current year, so that you can see quickly the differences
between the two years. But we are not done yet, This is only the bar charts. Now we have to go and
add a KPI for it. So here we have to define
the rule of the KPI. And this time is
going to be easy. If the current year is less than the previous year,
then it's going to be red. If it is more or equal,
it's going to be green. Let's go and define
this rule as usual. We're going to go and create
a new calculated field. We can call it KPI colors. Now we're going to go
and define that rule. We're going use as
well the FL statement. If the sum of sales of 2021 is higher or equal to the sum of sales of 2021, then we are safe. It's going to be green. Let me just make it a little bit bigger in order to
see everything. But if the condition is not fulfilled, what's
going to happen? We will have bad performance, so it's going to be
else red and then ends. So this is our rule. Let's go and hit okay. Now for the KPI, we need
another chart inside this view. But since it is
like a dimension, if we bring it to the view, it will not split into
two different visuals. In order to generate
another chart, we will use the trick of
using the average of zero. So we have to create a
placeholder average of zero. And with that, as you can see, we will get a new chart on
the right side, this measure. We will go and configure our BI. Let's go and switch
to this marks. And now we're going to switch
it from bars to shapes. It's like we are
building any other QBI. I will go and get rid
of those informations. And now we're going
to go and get our new calculated field, the KPI rule, and put
it on the shapes. Next we're going to go and
define the shapes of our QBI. Let's click on Shapes. Let's say if it's green, then it's going to go up. And if it's red, it's
going to go down. That sets for the shapes. Click Ok as well. We want to change the
coloring of those stuff. Let's take the BI colors hold control and put
it on the colors. Let's go and assign
it, edit colors. Green going to be green
and red going to be red. That's it. Click okay. Now we have our KPI
on the right side. We can go and make
it a little bit bigger in order to
see the shapes. Now we have two
different charts. The next step we're going to
go and use the dual axis. That's because they
have different shapes. So let's go to the right
sides and have the dual axis. And as usual, we're
going to go and synchronize the axis
and remove one of them. Let's go to the
average as well and then go and disable show header. With that we hide it with we got the two KPIs on
top of each others. But still here we have an issue. As you can see, the icons of the KPIs are exactly on the
top of the edge of the bars. And that's because everything
is starting from zero. And we have here the
average of zero. Now what we're going to do, we can move it a little bit to the left side using
the negative values. Let's go to the average
of zero and switch it from zero to minus ten K. We can see our KPI is perfectly on
the left side of the pars. And we can see immediately
where we are doing bads. Here we can see
that almost all of the subcategories
are doing grades. We have all those green icons, but only two, the envelopes and the machines are doing bad. That's because the sales of the current year is
less than the sales of the previous year that
we have learned how to combine the KPI charts
with any other charts. It should not be a bar chart, it could be an area
or a line charts.
186. Tableau | BANS: Okay, so now we're going to
create bands in Tableau. There are those big numbers that you can see
usually in BIs or in dashboards where
you're going to see the total of something
like the total of sales, the totals of profit. How many customers do we
have inside our datasets? So it's very common and you can see it almost in each dashboard. So let's go and create it. What we're going to do first, we have to go and switch our visual from
Automatic to a text. Since we are working with text, there is no charts
or any visuals. Let's take the sales
and put it on the T. Now with that, we've
got one number. Without any charts,
only one big number, the total sales of our data. Now we can go and split it by
a dimension like a country. Let's take the country, put it on the columns, so now we can see the total
seals of each country. Now since we are
talking about pans, those numbers should
be really big. In order to change that, let's
go to the text over here. Click on those three points, and then let's go to the
Sells and make it really big. We're going to go to
the size over here. Let's take, for example,
22 and make it polled. Then you can check
by hitting apply the size of those numbers
there. Looks good. Now let's go and hit, okay. And let's make the
alignments correct. So let's have everything centered on the horizontal
and the vertical. Now say we can go and change
the format of those numbers. Let's go to the Sum of Sales
over here and go to formats. And then we can
go to the numbers over here in order to
change the formats. Let's go for custom. So there is no decimal
blass, let's make a zero. And then let's say
we're going to display the unit as 1,000 as a K. Then we can add the dollar sign on
the Brefix over here. So let's go and do that.
That's all about the formats. Let's go and close it from here. Now with that, we have created really nice pans
for our dashboards. We can go and make
it a little bit bigger, See those numbers. Now you might say,
you know what, I would like to have those texts beneath the numbers,
not on top of it. To do, that's what
we're going to do. We're going to take the country again and let's put
it to the text, and we're going to get
the text below it. But of course we have to
make it really small. Let's go to the text over here, then to the three points. And then let's go
to the country, Remove the polt,
and let's move it, for example, like 12. All right, now let's go and hit a line order to
check the format. Now as you can see, we've got those small text
beneath those numbers. But we can go and as well
reduce it to ten to make it really small beneath those
big numbers. Now let's go. Okay, and with that, we got really nice small text
below our numbers. But we still have an issue where we have the header informations. In order to remove it, just go to any values like
Germany over here, right click on it and
disable the show header. And with that we got
really nice pants where the text is below
the pick numbers. So as you can see here, we
didn't use any type of charts, we just use the text in Tableau.
187. Tableau | Funnel Chart: Now we can learn how to build
a final chart in Tableau. Final charts are really
great in order to show the progress of your data
through different stages. Let's see how we can build that. Let's take the seals
and put it in the rows. Now we want to see
how the seals are progressing through the
different subcategories. Let's take the
subcategories from the products and put
it to the colors. Now the next step, we
would like to change the size of those blocks
based on the sum of sales. In order to do that,
let's take the sum of Sales by holdering control
and put it to the size. Now let's go and switch
it from standard to entire view in order to see
the size of each block. Now we need to form the
shape of the funnel. In order to do that,
we're going to go and saw the data descending, the biggest one
going to be on top. And then we go to the small. In order to do that, let's go to the subcategory of our
here, radically connect. And let's go and sort it. And then we have to change
the sort pie to a field, then move it to descending. That's it. As you can
see from the background, we have now the
shape of the funnel. Now the next and all the important step in
the final chart. We want to show the percentage
of total for each block. In order to do that,
let's take as well the sum of sales and
put it to the text. With that we got the total
sales for each subcategory, but we don't want that. We want percent of total. In
order to do that, radically connect And let's go to quick table calculations. Then let's pick the
percent of total. Great, now we have
those percentages on the finals,
which is very nice. And the final charts,
let's go and add as well the text of
the subcategory. Let's take the subcategory
and put it to the labels. Now we can go and customize
our view a little bit. Where we say, okay,
let's put the text of the subcategory on top of
the sales, switch the order, then let's go and change
the labels and make the subcategory a little bit bigger and pulled, let's say. Okay as well. We
can go and remove those grid lines so radically
over here to the formats. Let's go to the
lines, and then let's go to the zeros over
here and make it none. All right, so that
is more clean. What we can do, we can add
the category to the filter. Let's go to the category, show it as a filter. And with that, we
can go and select specific category in
order to see the data. With that we get like
less blocks inside the Finnel charts or you can go and add all of them. That's it. This is how we can
create Finel charts in Tableau in order to track and check the
progress of your data.
188. Tableau | Progressbar: In our QBI Dis parts we can add stuff like a progress bar. Let's see how we can
build that in Tableau. Now let's go and get a dimension like the country to the rows. And then we're going
to go and track the progress of our
sales as a progress bar. In each brogress par,
you have like 2 bars, The one in the background for the 100% and then
your actual progress. That means we need
two bar charts. Let's stick with the first
one and switch it to bar. And as well, let's
show the text. But now instead of
the total sales, let's go and switch it
to a percent of total. Let's go and switch our sales to table calculations, 2% of total. Now the next step,
we're going to go and add the background bar. In order to do that, let's
go and add our placeholder. It's going to be the
one average of one. Now we've got our background on the right side and
on the left side, we're going to get
the actual progress. Let's go and merge them
together using the dual axis. Right click on the right one and then move it to
dual axis, okay? As usual, we're going to go and synchronize those two axes. And let's go and
make it a little bit bigger in order to see the bars. Now we can see that the average, the background is in the front. In order to switch that, let's go to the axis
of the average. Right click on it
and then here we can say move marks to the back. All right, so now
the next step in order effect of the brokers par, we have to change the
coloring of the background. Let's go to the colors edit. And then let's
select the average. And let's take the plu, let's select something lighter. So let's take a
light plue apply. Okay? All right, so with us we get the effect
of the brokers par, let's go and hide
few stuff like for example the Ag
over here as well. Let's hide those numbers
on the background. So let's go to the
labels and hide them. All right, so that's it.
This is how we can create a really nice progress
bar in Tableau where you can put it
inside your dashboards.
189. Tableau | Choose The Right Chart: We learned how to build 63 charts in Tableau and
what are their use cases. But you might be still like overwhelmed with all
those options and all those charts in
Tableau and it's still not that clear how
to answer the question, how do we know which chart, which visualizations
that we have to pick. That's why we're going to
go now and summarize and group all those charts
under different categories. We have the change over
time, magnitude part, whole colorations, ranking, distribution, spatial and flaw. And each of those
categories is going to focus on a specific question, specific problem in order to answer it using
visualizations. So now let's go through
all those categories one by one in order
to understand them. All right, so now we're going
to start with the first one and the most basic
category we have, the change over time, or sometimes we call it
trends over time. This category is going
to show us the trends or the patterns over a
continuous period. And it usually
answer the question, how does the data change
over time or another one? Are there any trends
or patterns that we can uncover from
the data over time? If you have the kind of questions then you
are talking about the category change over time and the best
chart in the category, we have the line charts. Because the line chart
focus only on one thing, the changes over time,
the trends over time. Because mainly the line chart focus only on the
changes over time, the trends over time,
nothing else as well. Visually, it makes it
really easy to spot trends. As we learned before,
we have multiple charts that covers the topic
of change over time. Of course, all the line charts usually are change over time, so we have the line chart
as the perfect one. Then we have as well
the Spark line charts. We can use it if
you want to have a compact chart for the trends
analyzes over the time. Or we can use the
sloppy charts to see how the ranks is
changing over time. Or as well, we can
use a part charts, so we can use the parts as well in order to analyze
the changes over time. And as well to go and compare different time period together. Not only the part charts,
we can use any type charts. For example, the
stacked area charts. Here we have
different use cases, one of them is the
change over time. And as well, to go and compare different categories
together as well, we can go and use the
calendar chart or the circle Pubble time line in order to visual
the change over time. So as you can see, if you want to have only one use case inside your visualization to show the change or the
trend of our time, then go with the line charts. If you want to go and cover multiple use cases in one chart, then you can go and
use the area chart, bar chart, or the
circle time charts. Because they don't focus
on only one use case. They can cover
multiple use cases and one of them is the
change over time. All right, so now we
have the magnitude. Sometimes we call
it size category. And it uses the sites in
order to compare values. We could use relative or absolute values
in this category. So for example, if you have the following task or question, find out the highest and the lowest tales
of the categories. Or we have to go and compare
the different categories by sales in one chart. If you have such
questions or task, then we are talking about
the category magnitude. And the best chart for this question is the
bar charts because it makes it very easily and clean indivisualizations in
order to compare values, you can compare very easily
the data by comparing the length of the bars
of each category. Under this category, we
can find multiple charts, and most of them are bar charts, so we can use the
raw part chart as a main one or we can use
a bar chart columns. As we learned
before, if you have a dimension with high cardinality,
you can go with a raw. But if you have a chart
with low cardinality, then go with a column. So those two charts only
cover one dimension. But if you have
multiple dimensions, then you can go with
the side by side bars, or the stacked part charts, or as well the full
stacked part charts. Then we have different
charts under this category, like the lollipop charts, pupple charts, and
the scatter plots. And you might ask
why scatter plot and Y pupple chart because the size of the Pubble can
be used in this analyzers. We can see immediately
that the technology and the furniture has
the highest cells from the size of the Pubble. The same thing goes
for the scatter here. Again, it's really depends
on how many questions you want to cover in
one visualizations. If it's only one use case
to go and compare the data, then go with the row part
chart or the columbar charts. But if the size
comparison is not only the use case that
you want to cover, you want to cover
multiple stuff like adding multiple
dimensions and measures. Then you can go with the other charts under this category. All right, now we have the
category part to whole. It shows how a whole or
value breaks down into its components and how each component contributes
to the whole, to the total. And it's going to show
how each component contributes to the
whole, to the total. So if you have a
question like how does the value contribute
to the total, we are talking about
part to whole category. And the best chart to visual, the answer is the pie charts. Because visually it's very easy and as well very
effective to show how each slice of the pie going to contribute to the whole pile. In this category,
the part to whole, we have different chart types, like as we said, the main
one is the pie chart. But we can go and use
the donut charts, especially if you want to show the information of
the whole, the total. So you can present
it in the middle and around it you're
going to have the slices. Or we can go and use the
part chart, for example, the full stacked part
chart or the area charts. The full stacked
area charts as well. You can go to the tree map
if you want to analyze. Not only the part to
whole, but as well. You want to show the
hierarchical data as well. We can go to the waterfall
in order to show part to whole and as well the
flow of the data here. Again, if you want
to only focus on the part to whole use case,
go with the pier chart. But if you want to
add more information and analyze different use cases, then you can go with the others. All right, now we're
going to talk about very important category.
We have the correlations. It can show the
relationship between two or more measures
In one visualization, this category can
answer questions like, is there any relationship
between two measures? Or how strongly related are two variables
or two measures? If you have such questions,
then we are talking about the category correlation
and the base chart, in order to visual the
correlation is the scatter plot. The scatter plot is
very effective in order to show the relationship
between two measures. And it covers a lot
of use cases like discovering the outliers.
It's very flexible. We can add a lot of informations
to each data point. And as well, it can help
us to build clusters. If the question to show the relationship
between two measures, the base chart is to
use the scatter plot. And underneath this category, we can find different
type of charts. Not only the scatter plot, but scatter lot is
the favorite one. We have the Quardon charts. We can use it as well to analyze two measures
and as well to cluster our data or to
split it to four sections. Or we can go and use
the dual line chart if you want to see as
well changes over time. Not only the coloration, but you can see the trends as well. So we can go and use
two lines in order to analyze the coloration
between two measures. Or we can go and use one line and one part charts coloration. And as well, we
can go and compare the sizes of each part. Moving on to another chart,
which is very beautiful. In order to go and
compare two measures, we can use the butterfly
or tornado charts. And the last one
you can use as well the histogram in order to find the coloration between
two charts and as well to show the
distribution of your data. Again, if you want only to
focus on the correlation, nothing else, you can go
and use the scatter blots. But if you want to go and
add different use cases, like the change over time or the distribution or
comparing the sizes, then you can go and
use the other ones. Moving on, we have another
category called ranking. So we use this category if
the most important thing to show is the position of
the item in a sorted list. So for example, if
you want to show the ranking of customers, the top ten customers by the sales or the lowest
ten products by the sales, Then we can use the
ranking category in order to solve those tasks. And the best charts in this
category is the part charts, because part charts are really
amazing in order to build a list and as well to go and compare different
ranks together. All right, so in order to show the ranking we have
different types of charts. Basic one as we saw, we have the part chart whether
it's raw or columns. And then we have different
charts if you want to add more information or more
use cases in one chart. For example, the Lull pop
charts where you can go and put one extra information inside the circles or you can
use the sloppy charts. Here, not only we are seeing
the ranks between countries, but we can see how they
are changing over time. And we have other charts like the final chart or the
pump charts as well. Here we can show the ranks, how they are changing
over the time. The last one we can use as
well the butterfly in order to show the ranking of the
categories, for example, here. And as well the correlation
between two measures. Again, as usual, if you want to focus only on ranking
only in this, you can go and use
the part charts. But if you want to go multiple
use cases in one visual, then you can go and
use the other charts. All right, so now we have
the distribution category. We can use it in order
to show the values of a dataset and the frequency
of their occurrence. So if you have the
following question, like what is the distribution
of customers age? Or if the question is, what is the busiest time
in the work day? So if you have such
a type of questions, then we are talking about the distribution category
and the pit chart to visual those questions and the answers is to
use the histogram. Histograms are
amazing way in order to show the patterns using pens. And it's going to
make it very easy to understand the
distribution of the data. Under the distribution category, we can find different
types of charts, the main one going
to be the histogram. And we can go and
use different type of plots, like the box plots, in order to see the
distribution of data as well for the dot plot
over the time as well, we can go and use the scatter
plots or the quadrant charts in order to see the
distribution of our data. And as well to show the
coloration between two measures. We can go and use as
well the barcode charts. For example, here we can
see the distribution of each product in each
subcategory as well. The paper chart considered
to be a distribution chart. Again, if you want only to
focus on the distribution, then go and use the histochrom. But if you want to cover
multiple use cases in one view, you can go and use
the other charts. Moving on, we have
the special category. Use it when the
geospacial pattern of your data is the most important thing that
you want to show. If you have questions
or tasks that involves informations about the
location like country cities, states like, for example, you want to show which city
has the highest sales. Then we're going to go with this category, the
special category. Of course here the charts
that you're going to use in this type of
visualizations is the map. And in this course we have
built four different maps. The first one the field map, or we call it coroplith map. So as you can see, the states
are filled with colors. Or we can go and use simples
like here we are using the star in order to show
the sales for each state. And then we have learned
how to customize the maps. For example, here we have
created the night vision map. All right, so now we're
going to talk a type of category. We have the flow. We're going to use it
in order to visual the movement or the
flow of our data. So if you have a question
like how the data is moving from one
point to another point, then we are talking about
the category of a flow. And one very common
chart in order to show the flow of the data or
the process of the data, we can go and use the
waterfall charts. With this chart, you can
see the movement of data or the flow of the process
of your data as well. We can analyze here
the part to whole. All right, so what
do we have covered? The eight different
categories and we mapped different charts that we have learned in discourse
to those categories. As you can see, the
process is really simple. In order to understand
which chart of visualizations you
need in your projects, first you have to understand the questions that
should be answered. So once you understood the
task or the business question, you can go and map it to one
of those eight categories. And after that, you're
going to go and choose the best charts within each category in order
to answer the question. And with that, you have
learned the process of choosing the right
visualization, the right chart
for the question, and make sure to check
the description. I leave there link for the visualization sheet
sheets as well. You will find the Tableau
file where you have sorted all those charts under
the eight categories. All right, so with that,
we have learned how to choose the right chart
for your requirements. And with that we have completed the Tableau Chart section. And now in the next
section in our plan, we can learn how to create and design our dashboards
in Tableau.
190. Tableau | Section: Tableau Dashboard: Tableau dashboard. Now we can
learn the basic principles about how to structure our chart inside
dashboards in Tableau. And we can focus
on the containers in order to structure
our dashboard. So once we build all
those beautiful charts, we can go and group them in one place using Tableau
dashboard. So let's go. Okay. So if you create
a new dashboard, you will get different
options on how to customize and design
your dashboards. So for example, we usually
go and start changing the size of our dashboard
of this white space. In order to do that, if you go to the side on the left side, we have here three
different options. Fixed size, automatic range. What I usually do, I
go to the fixed size. Here we can go and customize
the width and the height. For example, let's scale with the width with 1,000 through 100 and for the height with 800. And then beneath us,
we have a list of all worksheets that we have
inside our dashboards. And then here it's
really important is the objects that
we have in Tableau. So here we have a list of different objects
like containers, text extensions, images,
blanks, and so on. Those objects, you
can use them in order to build up your
dashboards in Tableau. And the very important objects. Here, we have the containers in Tableau and they are
really confusing. If you are new to this tool, we will be focusing
on how to work with the containers in order to build the structure
of our dashboards. The first question
is containers. Containers in Tableau can
allow you to group up different Tableau objects
together in one place. The object could be
anything like worksheets, blank text images, or
even another container. Once you have all those
different objects in one place, you
can do many stuff. Like, for example,
moving them all together using the container from one position
to another one. Let's have a quick example. Let's take one of
those containers. Let's take the
horizontal container and drop it to the middle. And here's the first
thing to notice, if that's the
coloring in Tableau. As you can see, we have now a dark blue
border around this space. The blue border can indicate
that this is a container. Now, we can go and drop
anything inside this container. It could be a worksheet, it could be a text, anything. Let's go with any
sheets, for example, I have one prepared one, so drag and drop it exactly in the middle
of the container. Now you might notice that
we don't have any more, the blue color, the blue border. We have now a gray border. That means in Tableau,
currently I'm selecting an object
that is not container. So now we can go and grab anything like, for
example, a text. Let's take this object and drag and drop it on top
of this chart Here, let's write anything like the sales dashboards and just make it a little bit
bigger so he Okay. So now this you can see we have another object that
contain only a text. And as well it has
a gray border, So that means we
have one object with gray border and another
one with gray border. So now the question
is how to select the container that has
those two objects. There are many ways
in order to do that. So for example, let's say
we are selecting the text, if you go over here to those two lines and double click on it. So once we do that,
as you can see, now we have again
this plume border. That means we are now
selecting the whole container. So that means by double clicking on this small icon over here, you are going back
to the container that's grouping
up those objects. And there's another way in
order to select the container. So now let's go again inside it. And only click on the sheets. Over here again we
have this gray border. Now if you go to this
small arrow over here, we're going to get more options. And then here we
have the option of select container,
vertical container. Once we do that, we
will go back again to the containers where we have
those objects inside it. This is another way in how to select the current container. All right, so now you might ask, you know what, Why we are
selecting the container? Well for the following reason. For example, if you are
just selecting this charts, you can go over here
and you will get different options
about the worksheets. For example, you can
show the titles, the filters, the highlights. You can configure only
these worksheets. Those options are only
related to these objects. But now, if you want to go and configure the whole container, you have to go to the container. For example, let's go and Blan. If you go to the
options over here, we will get completely
different list of options. And anything that you are
selecting here can be reflected for all objects
inside this container. For example, in the
current container tables, there is still space left inside this container
in order to fill it. The whole space over here is not used, which is
naturally good. As you can see, we have
the text objects is way smaller than the worksheet
object, which is now fine. But what you can do in
Tableau is that you can go and split everything evenly. Containers, options,
you can see over here distributes
contents evenly. If you select thats
what can happen. As you can see, Tableau going to go and automatically split the size of the container
evenly for all objects. This is really
helpful if you have different charts
in one container, Tableau going to go and split the space evenly
for all objects. As you can see, the options
of the containers can affect all the objects
inside the containers. One more thing to
notice in Tableau, Tableau grit is
knee key container, always on the right side. This container is a special one where Tableau can
put all the filters, legends, highlighters, and
as well parameters always. Each other's on the right side. So for example, in
the subcategories we have the filter
of the order date. And immediately
Tableau can create a special container
on the right side and can place the
filter inside it. So for example, if you take any other charts that
contains those informations, let's take this one over here
and put it in the bottom. You will see Tableau
immediately going to go and add the filters inside
these worksheets. Beneath the first one here
we have the filter of the categories that
comes from these charts. If we take the next one,
the customer distributions, as you can see, we
will get a lot of filters in Tableau
on the right side. And as well the legends. So here we have
the profit sides. Here we have the country
colors and so on. All parameters, all legends, all filters going to
on the right side. And of course, if you
want to customize the container that table
creates on the right side, you can go to any objects
and then double click on it. And then you can go
and customize it. For example, I can go over here and split everything evenly. All right, moving on about
the containers in double, we have two different types, the horizontal container
and the vertical container. Let's start with the first
one, the horizontal container. If you use this type,
what can happen? All objects inside your
horizontal container going to be side by side
next to each other's. Let's try that. Let's take
the horizontal container, drag and drop it
to our dashboards. And then let's take one sheet, for example, the
subcategory over here. And then let's take another one. Once you can select
it, as you can see, table can offer
you either to put it to the left or to the right. For example, let's go and
drop it to the right. With that we've got two
charts side by side, near to each other's using
the horizontal container. Of course, if you go and add anything it's going
to be as well, either to the left or to the
right, or in the middle. Once you drop it, you will
get it as well side by side. This is how the horizontal
containers works in Tableau. Okay, the next time we have
the vertical container, what can happen here? All objects inside this
container are going to be on top of each other's
like the R stacks. So let's have a quick example. Let's take the vertical
container, Drop It Dashboard. And then let's take Any Charts, and as, we'll drop it over here. And now once we
select another one, we can put it, for
example, below it. And the third one, either below in the middle
or in the top, let's drop it in the top. As you can see, the
vertical containers, we are putting those objects
or those charts on top of each other's so that we are stacking the objects on
top of each other's. And this is how the
vertical containers works. One more thing about
the type of containers, which is very confusing if
you are starter in Tableau, is that you can
decide on the type of container as you are
dropping the second objects. So let me show you what I mean. Let's take, for example,
the horizontal container, drag and drop it
to our dashboards. So now we can go and drop different sheets next
to each other's, right? So let's take the first one as usual, let's put it over here. And now we come to
the second sheet and our expectation that we can put it either to the left or to the right because we have
horizontal container. Well, the second sheet or the second object
is a special one. You can use it in order to change the type
of the container. Let's take, for example,
this one over here. You can see we can put it left. We can put it right,
but as well we can put it on the top
or on the bottom. Once I drop it to the
bottom, what can happen? Tablet Going to go and
convert the type of this container to a
vertical container. So now we cannot go
and change our mind. It's going to be fixed. This is going to be a
vertical container. So for example, if I
take the third one, I cannot change my mind by putting it to the
left or to the right. I can put it only
to the top or to, can stay as a vertical. And the third one
will not change. The container type can drop it, for example, here at the
bottom On the second sheets, we still have the
option to change our mind to make it either
horizontal or vertical. Container depends on how you
are dropping the sheets. But after that, for
the third sheets, you don't have any more of
those options you can drop. It only depends on
the container type. All right, now the
more thing that we put inside our container, the things gets
more complicated. In order to control the
structure of our dashboards, there will be a lot of
nested containers on top of each other's and you will
lose control with the time. A complex container for
that tablet did provide a view of the current
structure of our dashboard. Now we are currently
at the dashboards. In order to go to the view, let's go to the layout. So let's switch that then. Here in the bottom,
we have something called item hierarchy. Here we will see the
structure of our dashboard. It starts with the tilts. If you click on that,
you can see Tablet Go immediately and select
the current objects. He will see the structure of our dashboard and it
starts with tilts, since we are using
these methods. If you click on that tablet, going to go and select the current objects
in the hierarchy, this is the highest
container where we have everything in our
dashboard inside it. Let's go and expand
our hierarchy. You can see that it then splits into
horizontal container. As you can see it clearly, we have one container
for all those filters, legends and so on. And on the left side, we have a container for
all our worksheet. And you can see
that by just like moving this slider over here. As you can see, the first
object is horizontal container. And then inside
horizontal container, we have two vertical containers. The first one going be this
container for the chart. And as you can see, things are stacked up on top of each other. So this is our first
vertical container. If you click on the second one, now we are selecting the
container on the right side. It's as well a
vertical container, as you can see all those
filters and stuff. Each other's. Then of course, we can go and expand those
containers to see the content. So as you can see, we
have here three sheets inside the first container. And in the second one
we have three filters. And then we have
those two legions. Having this item here, a key. It can help us with
a lot of stuff. For example, it can help us to understand the structure
of our containers, how things are nested
to each other's. And another use as well, to understand whether we
have made any errors by building the
containers as you are dropping stuff
inside your dashboard. Weird stuff might happen
in Tableau where you are creating way more
containers than you need. It can help us as
well to select stuff. For example, if I would like to select the horizontal container, it can be a little bit harder by double clicking on those
different objects. It's going to be
easier if I into the item hierarchy and just click on the
horizontal container. As can see, it's really
easy to go and select stuff inside the item
hierarchy as well. Here we can go and have options. For example, let's go to the subcategories over
here, right click on it. And with that we'll get all the options of the worksheets. Or if you go to the containers, you will get the
containers option. The item hierarchy
are really important in order to structure
our dashboards. All right, moving on, we're
going to go and learn how to drop objects
inside the container. Now just to make things easier, I just went through
all the worksheets. I removed all the filters,
legends, and so on. Just to keep things
simple, for example, let's go and start with
the horizontal container. Drag and drop it
to the worksheets. Let's take an object
like the sheet and drag it to the view tablet. Going to show you
different visuals to indicate what can
happen if you drop it. For now, everything
is gray and we have a clear border
of the container. That means now we are dropping the objects inside
the container. Once I release it over here, what can happen if
we go to the layout? You can see the
horizontal container contains the worksheets. That means with this action, we placed the objects
inside the container. Let's check another options. Let's go to the dashboard over here and take another sheet. Now if you drag it and as
you are moving your mouse, you'll find different
shapes and different stuff. For example, if you move your mouth a little
bit to the left, you can see that the gray line is on the left side
and the container, the blue container is marked, going to mean if you
drop it tableau going to add it inside the container
to the left side. If you move it to the right, going to happen the same
stuff path to the right side. As long as Tableau is highlighting the dark blue
color for the border, it means we are dropping the objects inside the
container. But now check this. If you keep moving your
mouse to the right sides, you will see that
Tableau can change the color from dark
blue to light blue. That means now we are dropping the objects outside
the container. So let's go and
do that. I'm just going to drop it
to the right side. Now let's go to the layout in order to understand
what happens. As you can see, the first sheet is inside the
horizontal container, but the second sheet is completely outside
of the container. If you just minimize
it over here, you can see that it's not inside the horizontal container. That means you have to be
really careful how you are dropping the objects
inside dashboards. Table can react differently,
depend on the shapes. Now let's go and
drag a third one. Let's take the
customer distribution now as we are dragging. So here you can see that
tablet is highlighting the container because the
mouse is inside the container. Here you can drop it either to the left, right, bottom up. But if I move my mouse
completely outside, Tablec, drop it outside
of the container. For example, I can put it
to the left, to the right, to the bottom, but all of those staffs are not
inside the container. Now let's go back
to our container. I will drop it to,
let's go and do that. And of course, to
check what happened, we're going to go
to the layout in order to check the
item hierarchy. Now as you can see, Tableau
changes from horizontal to vertical container because
we have dropped it below. And you can see
that this object, this sheet is inside
the container. All right, so that sets, be careful how you are
drag and dropping stuff inside table dashboards. Okay. Moving on to the
next one in table, we have two different
options on how to arrange our objects
inside the dashboards. And we have the tiles
and floating as a default table going to use Tiled option for
all our objects, but you can go and switch it to floating what those
objects means. Let's start with the first
one, the tiled option. If you use the option tiles
table going to go and automatically arrange your
object as a grid layout. That means, for
example, if you go and resize the dashboard
table going to go and automatically change the size of all objects inside the
containers and dashboards. Let's take an example. Now
we are selecting the tilt. And if you take anything like the sheet over
here and place it inside our dashboards table going go and automatically
use the whole space. So that means the
worksheets is going to take the size of the dashboards because table going to say okay, we have a lot of spaces,
let's go and use everything. But the other option
we have the floating. On the other hand, here if you select it here you
have the freedom, the flexibility on how to
customize the objects, and advantage of
the floating Dat. We can go and do overlapping between the different objects. But the disadvantage
of the floating dats, it's time consuming and you have to do everything manually. So now let's check
how this works. Make sure to select
the floating, let's take another sheet and just drop it
wherever you want. So as you can see, we have
now gray box indicate the place where we are
putting the charts. Let's drop it over
here. And now we have the full control where
to position the objects. For example, let's got
this icon over here and just drop it on
top of the old one. So as you can see, we are
now just overlapping. Or we can change the
size as we want. So I just can make it like this. So as you can see, we are
having the full control of this chart of the objects
without any limitations. Now the question is, should
I use floating or tiled? Well, in Tableau projects you can end up using both of them, and we normally use floating for the big containers inside the dashboard layouts
and the tilt for all objects that we have
inside those containers. All right, so those
are the main options on how to work with the
containers in Tableau. But of course, the
best way to understand the containers in Tableau is
dots to have real projects. And that's why as next
we're going to have a mini projects in
order to understand how to design and build
the layoft of our dashboards using
the containers. All right, so that
was the basics about Tableau dashboards and how
to deal with the containers. Next we're going to
build a simple dashboard and learn the dashboard
development process.
191. Tableau | Tableau Dashboard Project: All right, so the task
or the project is to create a dashboard
for the sales. And one of the first steps
that we usually do in order to plan our dashboard is to
create first a skitch. Here we're going to go and
draw a very simple skitch for the sales dashboards. Where first, for example,
we have the title of the dashboards like
the Sales Performance, and then beneath it we can have three pick numbers
or three pants. So we have the total sales, the total profits, and
the total quantity. And then beneath that, we're going to have three
different charts. The first one on the left one, we're going to have a
power chart in order to show ranking the top
sales by category. And then on the right
side we two charts. The first one going to be a line charts where
we're going to go and compare the sales
with the performance. And below that we're going
to show the sales by category using P charts
that we have a sketch, we have a plan on how to visual our informations
inside the dashboard. Now in the next step,
we have to go and plan the structure of our dashboards in Tableau using Containers. If we're going to
go and translate this sketch to containers, we're going to have one
big vertical container that has three objects
on top of each other. We have the title, then the
bands, and then the charts. Since they are on
top of each other, we're going to use the
vertical container. Now we're going to go in more details in
each information. So let's start with the
first one. We have the text. The text, We don't have
any other informations like beneath it or side by side. That's why we will not
use any container here. Then moving on to the next
information to the pans. As you can see they
are side by side. That means we can go here and use the horizontal container. That means the
horizontal container is inside the
vertical container. Okay, moving on to the next
one, we have the charts. And here, it's going to
be a little bit tricky. First, if you check the sketch, we have like charts side
by side, left and right. That means we're going to go and use the horizontal container. Again, here, this
horizontal container going to be inside the
big vertical container. Now if you check the right side, you can see that on
the right side we have two charts on
top of each other's. So that means on the right
side we can go and use the vertical container in order to cover those two charts. So this vertical container
going to be inside the horizontal container
and both of them going to be inside one
big vertical container. So as you can see, everything
makes sense if you are organized and you
start sketching and planning your dashboards, so now we have a plant enough. Let's go to Tableau and start
creating this structure. All right, so now we're going
to start from the scratch. We have one empty dashboard. And now let's go and
follow our plan. Where first we're
going to have the main container, the
vertical container. So let's take it from objects, the vertical container, drag and drop it to the dashboards. And now as you can see, if
you don't select anything, it's going to be still a
white page in order to have an identifier for
this container and make it easier to see
during the design. What I'm going to
do, we're going to go to the layout over here. So select the container and then we can have
a border for it. So let's go to the border
over here, make it a line. And then let's make
it a little bit heavy and give it
the color of orange. Now if ID select,
you will see that we have one big container,
the orange one. And this can indicate
for me this is a vertical container as well. What we can do, we can go to the item hierarchy over
here and give it a name. So let's go and give it a name. So now let's call it the
main vertical container. All right, so what
do you have inside this container?
Three informations. The first one going
to be a text, the title of the dashboard. Let's go to the dashboard
over here and grab our text objects and drop
it inside this container. Let's call it sales performance
and get little bit pi. Let's make it 2022 bold. Okay, that is the
first information. The second information that
we're going to go and add a horizontal container
for the different pans. Let's go to the
objects over here. And grab the
horizontal container and just put it
beneath the text, now that we've got a
horizontal container. And let's go and make
an identifier for that. Let's go to the
layout, make a border. And now we're going to
give it the color of blue. So now we can see that we have a blue container inside
the orange container. And we can go and
give it a name. Let's go to the hierarchy, and let's give it
the name of pants. And now what we're going to
do, we're going to go and add planks inside this container in order to have a placeholder for the actual
pants in our plan. We're going to have three
pants. What we're going to do, we're going to go
to the dashboard. Let's go and add three planks. And as you can see now
we have it very small. Since it's plank, let's make
it a little bit bigger. And let's go and add the
second one to the right side, another one to the right side. Now what we can do,
we're going to go to the layout and go and check
the structure over here. As you can see,
everything is fine. Those planks are inside
the horizontal container. All right, that's all for the
container, for the plants. Now next information,
we're going to have the charts again. Here we're going
to go and add as our plan horizontal container
beneath this one over here. As usual, we're going to
go to the layouts and give it a color and as well a border. As you can see, we
have one container beneath another container, and both of them are
horizontal containers. Let's go and give it a name, but we're going to
call it charts. Now. We're going to
go and add the plans, the placeholders for the charts. What we can do, we're going
to grab a plank over here, it goes again, small. Bigger, the second one
to the right side, and that we got the
left and right. Now as usually, go back to the layout and check
whether everything is fine. So you can see those two planks are beneath the
horizontal container. Now as you can see, I'm
always going back to the hierarchy in order to check whether
everything is fine. And here is exactly
my tip for you is always to check and don't
leave it until the end. So don't check the
item hierarchy at the end after you drop
everything in the charts. I promise you will see stuff
here that you didn't plan. As you are dropping a new
stuff to the dashboard, go and check the item hierarchy, whether everything is fine. All right, now only
on the right side, over here we're going to have two charts on top
of each others. So that means we can have
a vertical container only on the right side. Let's go to the
dashboard over here. And now remove the right plank, because instead of
that, we're going to have the vertical container. Let's click on this plank
over here and drop it. And then let's go and get
our vertical container. And just put it to
the right side, make sure it's placed
on the right side and we still inside the container, off the horizontal
container, let's drop it. Now you can see we
have something on the right and
something on the left. Let's make it a little bit bigger to the middle over here. Let's go back to the layout
and check everything is fine. So you can see we
have the horizontal container, this main one, and then inside it, on
the left it's plank, and on the right we have
the vertical container. Let's go to the right
side and give it a color. So it's going to be a border and this time going
to be orange. In container we're going
to have two charts. So I'm going to go with
the planks again and put it here inside,
underneath each other's. Now let's go back to the layout. And as you can see, we have those two planks
for the charts on the right side and one big
plank for the left one. Now the next day what
we're going to do, we're going to go and
make sure that everything is distributed evenly. Let's start with the
container on the right side, over here, right on it. And let's click on
Distribute Contents Even. Then let's go to the next one, to the horizontal
container for the charts, right click on it and
distribute the content evenly. And then we're going
to go to the next one, right connect and
distribute things as well. Even now for the last one,
for the main container, I will not do that because things here has
different sizing. So the text can be
smaller than the pans and the charts going to take
the most of the space. All right, so with
that, as you can see, we have built the basics for our dashboards and we have
implemented our plan. So now the last step we're
going to go and bring the content inside
our containers. So let's go to the
dashboards over here. So let's start with the pants. So let's take the pan sales, then the profits
and the quantity. And what we're going to do,
we're going to go and remove those planks since we
don't need them anymore. Now things here don't
look really nice, because here we have titles. So let's go and remove the titles from each
one of them as well. We would like to have
everything in the center. In order to do that, click on the objects and go instead
of standards to entire view, or for example, if you go over here to those more options. Fit and then entire view. And for the quantity,
we're going to go and switch
it to entire view. With that we have our
three pants as plants. The next thing we're going
to have the Pow charts on the left side in order
to show some ranking. So let's go and grab
our part charts. And what we can do,
we're going to go and remove the placeholder,
the plank. And then the next step,
we're going to go and add the last two charts. So first we have the
line charts going to be Sales versus
Profits over here. And as well as I'm going to
go and remove the plank. And the last one,
it's going to be the pie charts,
sales, Pi category. Let's drop it over here
and remove its plank. Now the next step we're going to go and make sure that everything has entire view.
Same for the Pi. All right, so as you can see, as we have a solid structure, everything else is
going to be easy. We are just drag and drop
stuff and remove the planks. Now with that, we
have everything. Let's go and remove
those porters. So let's go to the layout. Go to the first
one. Let's remove the border to the horizontal. As we'll remove this, all
our containers removed. All right, so with
that we have our dashboards and of
course we can go and add a lot of designs and
a lot of customizations. For example, we can add a
border for all those pants. Let's go into it just quickly. We can add a great border for each of one of them in
order to separate them. With that, we have built a very organized and simple dashboards in double using the
power of containers. So as you can see, it's
very easy once you organize your stuff and
do it step by step, instead of rushing
things and dropping your charts immediately to the dashboard without any plan, it's going to be really
hard to control. And as well, the
look and feeling of your dashboards
gonna be really bad, especially if you want to add more elements with the time. It's going to be really hard
to extend your dashboard. Slow down, make a plan and
then implement it using the containers in Tableau and at the end bring
your contents. Alright, so that's all
about dashboards, Tableau. Alright. So with that, we have a solid foundation about
the Tableau dashboards. In the next section,
we're going to do a real Tableau
project where you're going to learn how to execute Tableau projects step by step.
192. Tableau | Section: Tableau Project: A projects now we can work together in order to
implement Tableau project. But what's special about
this project is that you will not only learn how
to work with Tableau, but also you will learn how I usually implement projects
in pig companies. I'm currently leading
big data and business intelligence projects
in Mercedes pens. So that means I'm sharing
with you now in knowledge of real life skills on how we implement staff
in real projects. It's not just another
online course. So I'm going to take you
in the projects from the starting point,
the user requirements. And we're going to
end up by having a wonderful Tableau dashboard. So the first step,
we're going to go and analyze the user requirements. We're going to design and
draw a dashboard, mock ups. And then the first step
in the implementations, we're going to prepare
our data source. And after that,
we're going to start building the different charts. And once we have all the charts, we're going to start planning
our dashboard containers and we're going
to start building and designing the dashboard. So let's start first by
understanding the phases, the steps of any Tableau
projects. So now let's go.
193. Tableau | Tableau Project Steps: Projects are like
any other projects. For example, building a house, The first thing that we
have to sit with the users and understand the
requirements and their wishes. That means we have to analyze
the user requirements. And then before starting
constructing the house, the architect can go and
create a blueprint and the layout by defining the structure of the
house and the rooms. And then after
everything is planned, the foundations of the
house going to be created. And this is very crucial
step in the construction. Now, once the foundation
is finally stable, the construction going to be starting by building the floors, walls, roofs, and so on. The last phase, it is the finishing touches
by adding doors, adding electricity,
choosing the paint colors, the decorations. The
project phases of building a house is very
similar to itable projects. And I'm going to show you now the different phases that I
have usually in each table. Projects. In the first phase
of each double projects, we start with collecting and
analyzing the requirements. First, we have to understand
the user requirements. Then we have to go
and decide on which chart types we're going to
use for each requirement. And then together
with the users, we're going to go and
draw the first mok up of our dashboards. And as well decide on the colors we have understood
the requirements, we can go and start
building stuff in Tableau. And we start with the first step by preparing the data source. And here we have the
following steps. First we have to
connect our data, then we have to
build a data model. And then the last
step of that, we're going to go and understand the data model and the data
inside our data source. Then once we have a
solid data source, we can start
building our charts. And here we have
different steps. First, we have to check whether we have all the data inside the data source or we have to create a new calculated fields. And then once we create
those calculated fields, we have to go and test them first before we start
building any charts. And then after that, once we have all the
data that we need, we can start
building the charts. And then once we have
the basic charts, we're going to go
and start formatting it by adding colors, removing grades, editing
the axis and the headers. Now once we are building all our charts using the worksheets, we're going to go to
the last phase where we can start building
our dashboards. And now for this phase,
you have to slow down and start planning
everything step by step. And rushing on this phase
will not help you at all. So first we start planning the whole structure
of the dashboard by planning the containers. And once we have a plan, then we go to the
next step where we start building
the foundations. We start building the
containers of the dashboard. And once we have a
solid structure, we're going to go
and start adding the content to the dashboard. And after that, we're
going to have the step where we can take care of the filters and the interactivity
inside our dashboard. And then the last step
of building a dashboard, we're going to have the
final touch by adding like icons for the logo, icons for the filters, or for navigating
between dashboards. All right, so those
are the main phases of building a
dashboard in Tableau. And of course, my
recommendation is to take it step by step and
don't rush things, otherwise you're going
to end up by chaos. And it can be as
well, really hard to maintain the dashboard later, so don't rush building
the dashboards always take time in analyzing
the requirements, understanding the data, planning the structure,
planning the mockups. And by that, I promise you going to deliver a
professional work.
194. Tableau | #1 Step - Requirements Analysis: All right, so I'm
going to start with the Tableau project from the scratch where I'm
going to show you step by step how I
usually implement projects using Tableau
and we start right now, all right, so the first step in each project that
we do with that, we're going to go and
sit with the users in order to understand the
requirement, their wishes. And we usually document the requirement in something
called user story. So now we're going to go
through these requirements. I'm going to leave the
link in the description, and then we're going to
go and start choosing the right charts for
each requirement. So the user story or the project is about
sales performance. And here in the
introduction it says, we have to go and build
two different dashboards using Tableau to
help the managers, the stakeholders in
order to analyze the sales performance and
as well the customers. So that means we're
going to go and build two dashboards inside Tableau. So let's start with the first
one, the Sales dashboard. The main purpose of this
dashboard is to provide an overview of the sales
metrics and trends. Here it says, in order to analyze year over year
sales performance. So that means here we are
comparing two years together. Let's check the key requirements
in these dashboards. So the first one is, that's
to provide an overview for the PPI where we have to display a summary
of total sales, profit and quantity for the current year and
compare previous year. So that means in the dashboard, we don't have to
present all the sales. We have to present
only the sales of the current year and as
well the previous years. And now let's go and decide which type of charts
that we have to present. For these requirements,
we can go with the bands. Bands are very useful
in order to show the main metrics like
the total sales, profit, quantity,
and big numbers. For this requirements, we're
going to go and create bands for its. Let's
go to the next one. We have the Sales Trends. Here we have to present
the data of each KPI. That means the total
sales profit quantity on a monthly basis. So here we are talking about
change of our time, right, for both the current year and compared to
the previous year. And as well here, they want
us to identify the months, the highest and
the lowest sales. So that means we
have now to choose a chart that presents
a change over time. And for this, you can of course discuss it with the users and show them different types of
charts as we heard before. So for now I'm going to go with the line charts and precisely
we're going to go and use the Spark line charts in order to highlight the
max and min values. All right, moving on to
the third requirement, we have the product
subcategory comparison. So here we have to
compare the sales of different subcategories
for the current year and as well the previous year. And it says as well,
we have to include in the comparison as
well the profits. So here we are comparing
multiple stuff. First, the subcategories
with each other. We have two measures, the
sales of the current year, the previous year, and
as well the profits. So here we can
understand that we are comparing the members
of the subcategories, and for that we can
use the bar charts. And since we have two values, the current year and
the previous year, we can use, for example,
bar bar charts. And then for the second point, in order to compare the
sales with the profit, we can present as well
another bar chart side by side to the sales in order to show the profit informations. All right, so moving
on to the last one, we have the weekly trends for sales and profits
requirement sales. We have to present
the weekly sales and profit data for
the current year. So here we are talking about change over time because we have the time aspects and we have to display as well the
average weekly values. We have to highlight the
weeks that are above and below the average in order to understand the trends
in our charts. So here again, we are talking
about change over time, but on the weekly basis we
have it before as a monthly. So here we can go as well
with the line chart in order to compare the
sales and profits. All right, So that
we have covered the main requirements of
the sales dashboards. And as well, we have
a plan on which charts be used for
which requirements. All right, now we're
going to move to another type of requirements. We have the interactivity
requirements. Here. It says that the dashboard should allow the users to check the historical data by allowing them to select any desired year. And not limited to just the current year or
to the last year. So that means the dashboard
should be dynamic, where the users select
the year that they want to compare it with
the previous year. So it should not be always
the last current year. And for that, we
can use parameters in order to solve this task. Then we have the
second requirement. It says that we have
to provide the users the ability to navigate through the dashboard
very easily. And for that we
usually epatoms inside our dashboards in order to switch back and forth
between the dashboards. And the next about interactivity of the
user should be able to filter the data using the charts and for that we can
use dashboard filters. And now moving on
to the last one, it's about data filters. So we should allow the
users to filter the data by product information like
category and subcategory, and as well by the location like region, states, and city. That means we have to provide all those filters inside
our dashboard as well. All right guys,
with that, we have covered the first
two steps inside our projects where we understood the user requirements as well. We have decided and choose the right charts for
each requirement. Let's move to the third step, where we're going to build
a mop for our dashboard. This is how I
usually draw a mock up for a dashboard in Tableau. As usual, it starts
with the title. It's going to be
Sales dashboard. And we can put as
well in the title, Which Year Is
Currently Selected? So it can be, for example, the Current Year 2023. Now below that, we can
have our pants right. We can have three sections, or three pants for
the total sales, total profit, and
total quantity. Now in each of those blocks, we're going to show the
following informations. First, we have to show,
of course, the total. So we're going to show the
total sales as a big number. And then below it,
we're going to show the difference in percentage
to the previous year. Since we're talking about PIs, we have always to show
a symbol in order to show the performance
of the current year. So it's going to be either up so that we have covered
the first requirement. The second requirement is
to present the data on monthly basis and compare the current year with
the previous year. And for that we're going to use the Spark line in order to show the curves and as well the
progress of each line. So we're going to
have two lines, one for the previous year and
one for the current year. And we're going to
show the max and the min values using
like a circle. That we can position
it on the lines so that we have covered as
well the second requirements. And we're going to do the
same stuff for each KPI, so we're going to do
the same stuff for the profit and as well
for the quantity. All right, moving on to
the third requirements, we have to present the
subcategories comparison. So we're going to
go and use the bar in bar charts in order
to compare the current, the previous year. So for
that we're going to have the background bar in order
to present the previous year. And the current year going
to be the one in the front. And what is missing
here is the profit. So we can present
the profit side by side to the sales
to the right side. And as well using
the bar charts and the profit could
be plus or minus. The next infos we can present in this chart is the profit
side by side by the sales. And as well it's going to
be bar charts where it's going to have plus
and minus values. All right, moving on to
the last requirements, we're going to have
the Weekly Trends for sales and Profits. And here as well, we can use the line charts since
it's change over time. And we can have two sections, one for the sales and
one for the profits. We will not bring
them together in one because we want to
show the average line for each metric. So that means we can have a reference
line in order to show the average
for the sales and as well another one
for the profits. And then we have to go and
highlight using the colors, the data that is above the line and below
the average line. All right, so with
that, we have covered all the charts inside our cup. Of course we have to add
different stuff like a filter. So since we have a lot
of filters and there will be no space inside our dashboard, I'm
sure about that. We're going to go
and have an icon in order to show and
hide the filters. So that means we're
going to have a dedicated section
where we can put all our parameters
and filters like the product filters and
the location filters. And the users can go
and hit the Batom in order to show or
hide. This section, we come to very
interesting part of the design of our
dashboard dots. We have to decide
on the coloring. And it's very
important to decide on the coloring at the start of their projects so
that you don't have to adjust a lot of stuff later. So you have to decide on
the coloring as you are creating the mockups
together with the users. What I usually do, I use maximum of four colors
inside the dashboards. So the first two colors
are the basic colors and they really depend on the
background color of Tableau. If you are using
the white color as a background inside
the dashboards, then I usually go with a very
dark gray and light gray. So those two colors
are the basics that I usually use in each
dashboard that creates. And the other two colors really depends on the
user's preferences. You can lead the users to
decide on those two colors, or you can take it as well
from the icon of their logo. So as you can see in the Mocap, we are not designing
only the chart types and the position of the
charts inside the dashboard, but also the coloring
of the dashboards. So now here, the final dash that we can add to our cap art, we can add a logo
for the dashboards. And as well, we can add that dynamic where
we can switch to another dashboard by using Ptoms, as the requirement says. We have two dashboards, We have the sales dashboards and
the customer dashboards. And we can introduce on the
header of the dashboard two buttons in order to switch between those
two dashboards. So if the user clicks
on the customers, it can switch to the
customer dashboards. But if the users clicks
again on the sales, it can switch back to
the sales dashboards. All right. We will not design
now the customer dashboard. I'm going to leave it for
you in order to practice. We are focusing only
on the first part of the requirements of
the sales dashboards. All right guys, so now we have a Mocap, we have a Blueprint. And if the users agrees
on the plueprints, we can go and execute our plan. And we can start building
that in Tableau. And we will start by preparing
the Tableau data source.
195. Tableau | #2 Step - Building Data Source: All right, so so far we have understood the requirements and as well we have a mok
up for our dashboard. The next step it does,
we're going to go to Tableau and start
building stuff. All right guys,
so the first step is to prepare our data source. And I promise you,
start from the scratch, that's why we're going to start our Tableau public as an empty where we don't
have anything inside it. So now the first thing is
of course we need our data. Go to the link in
the description and download the data that I
live there for the projects. Then we're going to
go and connect it. In order to do that,
we're going to go to the left side over here, so make sure you are at the home page or the
starting page of Tableau. So let's go to the text file. And then he, previously
we worked with the Pig and Small data source. Now we're going to work with the Tableau Projects
Sales dashboard. Let's go inside it. And
here we get files which has similar informations as
the old data sources. So let's go and select something over here,
and click Open. So now we are at the
data source page, and as you can see, we have connected now our
data to Tableau. All right, the next shibit
that we're going to go and create our data model
inside the data source. So here we have to go
and understand our data. I'm just going to go and
remove this from here in order to have
everything from scratch, So we have to understand
our data inside those files to know what is
dimension and what is fact. Let's go for the Customers
over here and click View Data. And as you can see here,
we have only two columns, Customer ID, customer Name. This is the dimension, it
doesn't have any facts. That means the customer's
table is a dimension. Let's go and closet and
go to the next one. We have the locations, let's go inside and check the data. As you can see, we have city, country, region,
states, and so on. Those informations are
dimensional informations as well, because we don't have
any events inside it, it's not really a fact. Let's go and closet. Let's check the third one, the orders. So now we can see over
here we have some ID's, like the customer ID,
order ID, product ID. Then we have some dates, like for example here,
the order dates, we have the ship
dates and as well some numbers like the sales
quantity, profit and so on. So this is an indicator that this table is a fact
because we have a lot of measures and as
well we have dates which can indicate that
this table contains events. So once you see such a set
up where you have IDs, dates and measures, this is a big indicator that
this table is e fact. So the orders are facts. Let's go to the last
one to the products. So we can see that we
have the product ID, category, product
name and so on. Those informations
are a dimension. So that means this table, the products is a
dimension table. All right, so we have
now an overview of our data and we can start
moduling in table data source. The first thing we can start by drag and
dropping the facts. So that means we're
going to go and get the orders and put it in
the data model over here. And then after that,
we start bringing all other dimensions
to the data model. Let's take the
customers, for example. Just drag drop it over
here as a relation. Now as you can see Tab
going to create a relation. It's very important to
check the relationship. So as you can see, we
have the customer ID equals to the customer
ID, which is correct. We will leave all
other options over here in the performance
as a default, since we don't deal now
with the performance. First we have to build stuff
and then check whether the performance is bad
or good at the start. Leave everything as a default.
Let's go to the next one. Get the location, Drag and
drop it as well over here. And we're going to check
as well, the relationship, it's going to be the postal code equal to the postal
code as a key. And the last one, we're going
to get the last dimension, the products and throw it
to the data model as well. We can check the relationship. So as you can see, we
have the product ID equal to the product ID. All right, so we have our
data model where we have one fact and all the dimensions are connected to these facts. And now the next sibit
that I'm going to go and start changing
the names around. So for example, let's go rename our data source to
sales data source. And then we're going to
go to the table names and remove the CSV.
Rightly connects. And let's rename, let's
remove the extensions. And as well for everything, just to have it nice data model. So with that we have very
nice naming in the tables. All right, so this is
about the renaming. The next tab that
we're going to go and check the data types
for the fields, whether they are correct or not. Sometimes if you have bad data
quality from the sources, you will get strange data
types which can make later a lot of
issues if you don't check the data quality
at the starts. So let's do it quickly. We're going to go
to the broadcts. As you can see everything
here we have like characters and the
data type is string, so everything is fine
to the products. Let's go to the locations. And now we can see that
all those informations are geographical informations. And as you can see,
all the data types are correct beside the
region over here. So we can go and
switch to a region, So let's click on that and
go to Geographical Role. And here we have the
type of country, Region. Let's go
and select that. And we can see that's all
of the contained characters and they are the
data type of string, so everything is fine
as well, the customers. Let's go to the orders. And
here we have a lot of fields. What is very important to
focus here on the date field. So as you can see, the order
date and the shipping date, both of them has
the data Tup date, which is really perfect. And in many situations I see a lot of information
as the dates, but the datatype is
string and that's because we have corrupt
data inside those fields. And now the next important thing to check inside our data, we have to go and
check our numbers. So let's make sure
that all our numbers has the data type number. So as you can see,
all our fields has the data type number. And this is really
important because we want those numbers to be continuous measures in
order to build the charts. So that means if you have any of those informations as a string, what can happen table, and I
think this is a dimension. And then you cannot use
it in your visuals to do aggregations like sum and average because
it's a dimension. So that's why it's
really important to check that all your numbers has the data type number in order to have it as continuous measure. All right, so with that we have very good and solid data source. The next table that
I go and try to understand the data before I start building visualizations. So let me show you what I mean. Let's go to the worksheet
page and let's start, just randomly check the data
inside the data source. All what I want now is to
get closer to the data, to the content of those tables. Because normally on projects
we have a lot of tables. If you don't understand
the content of the tables, it can be really hard to find your informations and
build the correct charts. I know that you
have practiced with most of those
informations before, but I wanted to show you what are the steps that I usually do inside the projects in order to build really nice
visualizations. So now I go, for
example, and check, okay, what is category? Which values are inside it? And with that, I can see
that we have three values. That means we have low
cdonality inside the category. And then I check
another example. Let's say the
subcategory dragon, Drobta can see that there's like heirarchy
between those two dimensions. And then I go and take something else like the
segments of our here. Now we can see
that we have a lot of duplicates inside the data. Which means maybe
there's no relationship between those two dimensions
and the segments. If I brag it to the starts
still there's duplicates, so there's no relationship
between those informations. So I go and drop
those information. I can see we have
three segments. Those are actually segments of the users and not
for the product. As you can see, step by step, we are learning the data
inside our data source. Then the next step,
which is interesting, do we have a lot of countries
inside our data source. So let's drag and
drop the country. As you can see, we
have only one country. This data is about the USA data. Then interesting, which regions do we have inside the data? Which is so we have all four regions and states and so on. So as you can see, I'm
just browsing the data. So this is really important step in order to understand
the business and start discussions with the users of those dashboards
that you are creating. Reading your data,
understanding your data before creating any charts
or any visualizations. All right, so now
once you are done browsing and understanding
the content of our data, we can go to the next
step where we're going to go and start
building our charts.
196. Tableau | #3 Step - Building Charts: All right, so now we're
going to start implementing the requirements by
creating the charts. And we're going to start
with the first charts where we're going to
go and build pans. The requirement says,
display a summary of total sales profits and quantity for the current
year and the previous year. Let's not forget the
requirement that it says, the dashboard should allow users to check historical data by offering them the
option to select the desired year to
the current year. Now let's start with the
first pan where we're going to focus on
the total sales. Now let's go to our data. Let's go to the orders and check the information that we
have inside the sales. Let's grab it to the text
over here. And now, with that total sales inside our
data for all years. But the requirement
says we have to show the total sales for
the current year. So let's take, for example, the order date and put it
to the roads over here. So as you can see now
we have the sales for all years and not only
for the current year. So that means I need
feel that shows only the sales for the
last year for 2023. In order to do that,
we have to go and create a new calculated field. So let's go and do that. And we're going to call
it Current Year Sales. And then the function
can be really easy. We're going to check whether
the current year is 2023. If it's true, then we're
going to show the sales. Otherwise we will show nothing. And for that we're going
to use the F conditions. So let's go and use that. And then what we need is
the year of the order date. The condition is
based on the year. So if the year equals to
2023, then what can happen? We will get the sales rights. Otherwise, if it's not 2023, I don't want anything, so
it's going to be null. So that's it. Let's
end it again. The logic is very easy. We are checking the year
of the order date. If it is 2023, then show the sales. If it's false, then don't show anything,
it's going to be null. So let's go and hit okay. And with that we've got
a new calculated fields, the current year sales. Let's go and grab it to the view over here
to check the data. Now as you can see, this
field now is showing us only the sales for
the current year, 2023. This is for the first fields, but in the requirements
it says we need as well to show the sales of
the previous year. That means we have to show sales of the 2022. In order to do that, we have
to create as well, again, a new calculated field to
fulfill this requirement. So let's go to the
current year sales and go duplicated in order to create
the new calculated fields. So let's go and edit it. So now what we're going to
do, it's really simple. Instead of having 2023, we're going to go
and make it one year less. It can be 2022. All right, so let's go
and hit, Okay, With that, we have the previous
year of the sales. Now let's go and
check the values. I'm just going to
take it and put it here in between
those two values. And with that, as you can see, we have the previous
year of sales. So with that we have the sales 2022. So now we have the two main calculations
for the projects. We have the current year and the previous year for the sales. How to make those
two fields dynamic? We can go and use the
parameters in Tableau. Now, before we create
the parameter, we have to create one
more calculated field in order to have the years of order dates so that later we can use it
inside the parameter. So let me show you what I mean. Let's go and create a
new calculated field. Let's call it order
dates and be the years. Then what we're going
to say, we can use the function year and inside it we're going to
have the order dates. This field going
to return always the years of the
order date that sets. Let's go and hit okay. Now we're going to go and
create our parameter. Right click over here
and create parameter. We have to go and
give it a name. It's going to be
select a year and the data type going
to be integer since it's going to be
years. So there is no float. And now we have to define
what is allowed to be used as a value
inside this parameter. If you leave it
all, then the users can go and insert
anything which is not really good because then
the users have to go and guess how many years do
we have inside our data? But instead of that,
we have to give them a predefined list of all years that we
have inside our data. For that, we're going to go
and check a list over here. And then the values inside
this parameter going to come from the new calculated
field that we called it, years for the order date. Let's go over here,
add value from, then we're going to go and
pick our new calculated field. This is really good.
First, because it is automatic, you don't have manually add all those years. And second later, maybe you get a new year
inside your data. And you don't have
to go manually and adding those informations, it's going to be automatically
added to the list. We are almost fine, but I'm not really happy
with the format. As you can see, we have
hit the Southern point. Let's go to the display
format and what we can do, we're going to go to
the number custom. Let's remove all those
decimal places as well. The display unit is going
to be none that sets. So what we're going to
do, we're going to go to the number custom over here. Let's remove all
those decimal places, and as well remove
1,000 separator. All right, so that's all.
Let's click over here then. As you can see, we have
now the years without any separator thing
that we have to go and make the current
value as the last year. Let's go to the current value
over here and select 2023. That's all for this parameter. Let's go and hit or k. And as you can see we
have it on the left side. Now with the parameters, let's go and show it for the users. Or show parameter to the view. And now the users
can go over here and start selecting what
is the current year. As you can see, if I'm
selecting the years, nothing is changing
inside our view. And that's because
we haven't now link this parameter
inside the calculation. And this is exactly
our second step. Let's go and do that. Let's go to the current
year sales over here, and let's go and edit it. Now, instead of this
static value of the 2023, we're going to go and
add our barometer. Let's write the name of
the barometer it is. Select Year, and that's it. So what you are saying now. The year of order date equals to the selection
from the user. Then show the sales,
otherwise show nothing. Let's go, okay, let's
go and try that. So let's focus on the
current year sales and let's go and change
the value to 2022. And as you can see now the
current year for the sales, it is the 2022. And the same if you go over
here and make it 2021. So as you can see, everything is dynamic and the users now can go and select what is the
current year. Now the next. Yep. With that, we're
going to go and integrate it inside the previous year. Let's go to the
previous year, edit it. And the same thing,
instead of 2022, we're going to say select year. But now since we
are talking about the previous year, what
we're going to do, we're going to go and subtract one year. That sets. Let's go now, let's go and test again. So 2023, everything is fine. Let's go and switch the
current year to 2022. So let's do that. Now we can see that both of those two values did
react to our selection. So now the previous year is 2021 and the current
year is 2022. So that we have completed the first requirement
inside our user story, where the users can go and decide which year going
to be the current year. And we made it completely
dynamic using the parameters. All right, so with that we
have our main calculations for this project where we have the current year and the
previous year of the sales. So now the next step, as
we decided in the Mocap, we want to show the differences between the current
and the previous year. And we're going to
have it as percentage in order to show the KPI. Let's go and create a
new calculated field, and we're going to call it
percent difference sales. The calculation can
be really easy, so we're going to
go and subtract the current year of sales from the previous
year of sales. But now, since we want to
present it as a percentage, we have to go and divide
it by the previous year. Let's add starting
and ending brackets divided by sum of previous year. With that, we will get the
percentage of the differences between the current year and the previous
year for the sales. Let's go and hit okay. And with that we got our
new calculated fields. And now what we're going to
do, we're going to go and change the format to percentage. Right click on that. And then let's go to Default properties, number formats, And now
let's go to the percentage. And let's have only one
decimal. Let's hit, okay. Now in order to show
those values year, let's go and remove the year. And now let's go and
check the value of the differences between the current and the previous year. And with that, as you
can see, the differences between the current year and the previous year is
around 29% So again, we can go and check
our parameter to see whether everything
is working fine. So let's go to 2023. As you can see, the difference
now is only 20% Alright. So with us we have almost
everything that we need in order to
build our fares pain. So I'm going to call
this first sheet as a test in order just
to test the data. So let's go and create a
new worksheet, KPI Sales. And we can start building
our fares charts. So now if you check
our cap, our KPI has the first part going
to be the pants where we have the big numbers and the second part going to be the Spark line. Here
we have two options. Either we're going
to go and make a dedicated sheet
for each section, or we make everything
in one sheet, like the whole QBI in one sheet. And we're
going to do that. So what we're going to do in the title, it's
going to be the pan. So we're going to put
all the information of the pan inside the title
and then inside the view. We're going to go and
build our spark line. Let's start with the pans first. What we need for information is the current year of sales. Let's go and grab
it on the details. And then the second
information that we need is the difference of sales. So let's grab it as well
to the details over here. And that's it for
now. Let's go now to the title and start
building the pan, double click on the title. And now in the first
line we're going to give the name of the measure. So it's going to be
the Total sales. And then the second information, it's going to be the
current years of sales. So let's go to
Insert over here and add the sum of the
current year sales. And the third information
is going to be the differences. So a new line. Let's go and add
our calculation, the difference of sales. Now let's go and hit a line in order to see the information. As you can see, now
we have total sales. We have the total number of
sales for this year as well. At the end, we have
the differences. So now we're going to go and
start formatting this plan. So what we're going to
do, we're going to go over here to the total of sales. Let's make it the
front Tableau book. Then let's go and reduce it
a little bit more to 14. Now the next year we're
going to go to the total, Make it really big.
Let's select that. Let's take the font to
bold. Tableau Bold. And then let's go and increase
the font to, for example, 2022, and make it bold as well. Here we have really
to make it really big, let's go and hit Apply. Just to check the
numbers, as you can see, a total sales small, then a big number,
which is really great. Now for the next one, we
can go and select it. Let's choose, for example, the Tableau semibold, and
then make the size 220. Then we're going to go and add. That takes off versus
previous year. All right, let's
go and hit Apply. Now, everything looks fine. This information is not
really relevant to show. It's very bold inside our data. So let's go over here and change the fonts back to Tableau Po and as well, let's go and
change the coloring as well. Something like here,
really light gray. As you can see,
everything looks fine. Now let's go and change
the coloring and the format of the text because this is not really
relevant information. So we're going to
go over here and change it again to Tableau Pok. And then let's go to
the coloring and make it like light gray a little bit. Let's go and hit, Okay.
Now you can see that our pan look really
nice. Let's go and hit. Okay. What I'm going to do, I'm just going to go and change the format of the
total sales, right? Click on the current
year of sales, and then let's go to format. Then instead of having the axis, let's go to the pan over here and go to the format of numbers. Let's go to the number custom, remove the decimal numbers, let's have the unit as 1,000 To make it more easier
to read and let's add the dollar sign
in the prefix. So now things looks
more professional. So we have the
dollar sign and as well the number is rounded 2000. All right, so now the next what is missing inside our KPI? If you look to the Mok up, we have decided to
add the KPI simple. We need an icon to
indicate whether the sales is going
up or going down. In order to do that,
we're going to go to the differences and
change the formats. So let's go to the
differences to the formats. And then let's go to the
format of number over here. And let's go to custom. And then we're
going to go and add the following format in
order to indicate the PI. I will leave this format
in the description as well in order for you
to copy and paste it. Here what we are seeing, if the percentage is
a positive number going to be up. If it is a negative number,
it can be down. And of course, if
you want to add more decimals to the percentage, you can go over
here and add zero. So as you can see,
once I add zero, the format can change. But now for that I
would like to have only one decimal. All
right, so that's all. As you can see now we have
a really professional band where we have the total
sales of the current year. And as well, we have
the differences between the current year and the previous year using
a really nice PI. Of course, we can
go and test it. Let's go and show the
parameter to the right side. Let's go, for example, to 2022. And as you can
see, everything is changing perfectly, 2021. And now you can see the
arrow is down because the previous year was higher than the current year, perfectly so. With that, as
you can see inside the title, we have
created the pan. Now the next step
that we're going to go and create the spark line. All right, so now let's go
and build our spark line. It's going to be
based on the months, don't forget the requirements. It's to show the
current sales based on the month and then compared to the sales of
the previous year. So first let's go and switch
the parameter to 2023. And let's go and get our
order date to the columns. And now what we're going to
do, instead of having years, let's go and switch
it to months. And then we can go and
grab the first measure. It's going to be the current
years for the sales. Let's put it to the rows. And now instead of
having discrete line, I would like to have
it as continuous line. So let's go to the months
of our year, right? Click on it and switch
it to continuous. So now what we're
going to do, we want to compare it to
the previous year. In order to do
that, let's go and get the previous years of sales. And now since both of the
charts are going to be line charts and going to
be on top of each others, we're going to use the
measure names and values. So let's drop it on
the axis over here. Now you might note that
we have Brock in our pan. So we have here like a range between the lowest value
and the highest value. We don't want that, but we will fix it later. Don't
worry about it. So now let's keep focusing on the spark lines so that
we have our two lines. Now what is missing
is to highlight the highest value and the lowest value of
the current year. Now in order to get
those two circles on top of our view,
we have to go and another measure. But
first we have to go and calculate it using
calculated fields. So let's go and create
a new calculated field, and we're going to call
it min max of the sales. So now we're going to
go and search for the highest and the lowest
values of the sales. In order to do that, we're
going to go and check a condition using
the FL statements. So let's start with
the first one. We're going to say if the sum of the current year and now
we're going to go and check whether this value is the highest between all
other current sales. So what we're going to do,
we can use the function of Window and Max since we are searching for
the highest value. And then inside it we are comparing all those
current years, current year of sales. Now we are just checking whether you are
the highest value, it's true, then what
can happen then? Show the value of
current year of sales. That means if you are
the highest value, then show yourself.
Show the value. Otherwise, we're going to go and search for the lowest value, LF. We're going to take
the same stuff, some of the current year equal. But now instead of window max,
we're going to use window. I'm just going to go and
copy everything from here and replace
the max with me. Now what can happen if
you are the lowest value? We're going to do the
same show yourself. So we're going to show as well the value of the current
ear for the sales. Otherwise we don't
want to see any value. So what we're going to do,
we're going to go and say, that's it, the calculation is valid, Let's go and take. Ok, we have it as a new field, but I would like to test
the value whether it's working instead of throwing
it now to the visual. Let's go in to another sheet. Let's grab the other
date to the rose. Switch to month. I just want to check whether
everything is fine. Let's grab the current
year of sales to the view. Now with that, we have
the sales of each month. And now let's go and grab
the new calculated field, the min max, and
drop it over here. Now let's check the table. What is the lowest value? It's going to be the February. So as you can see,
we have the min and what is the highest value? It is November. Now,
as you can see, this calculation
is working here. My recommendation
for you, if you are creating something
complicated, always go and test on the
table in order to see the numbers before you switch it to like circles or lines. Those tables we can
go and validate. Peter, let's go back to our QBI sales and let's
grab our new value, Minmax sales and
drop it to the rows. With that, we got our new charts because we have a new
measure over here. We have as well in the Mark
new tab for the Minmax. Now let's go to this
tab in order to configure the Minmax
instead of automatic. We want to have, we're going
to go and make it a little bit quicker in order to see
those circles we have here, the min and the max. Now let's go to the first chart. So we're going to go
and switch it over here and make sure
instead of automatic it's a line because
we're going to go as X and merge those
two charts in one. In order to do that,
we're going to go and use the dual axis. Right click on the
Minmax over here, Use the dual axis. On the right side,
and maybe just hide it from the
right side over here. As you can see, we
have now those circles on top of our line charts. And with that, we are
highlighting the highest and the lowest value
inside our Spark line. Now we have our spark line, but now let's go back
to our pan and fix it. As you can see, we have a range. And that's because
inside the view, we are using the month as continuous fields and table going to go and
make it as a range. This is the disadvantage
of having everything in one chart that are like related to each other's what we can do. We going to go and fix it
by doing the following. Now, in order to fix this, we're going to use
a trick in order to make it fix and does not react to the things that
we have inside our view. Let's go and double
click on the first one. And we're going to add
at the end, Prackets. Let's add it at the end as well to the starts.
And let's go and hit. Okay. And as nothing
is changed because we have to go inside the
title and change stuff, but let's keep
changing those stuff. Let's go to the second one, double clone open Pcketstends. Let's add it to the starts.
So let's go and hit. Okay. So now the
next tip that's, we're going to go inside the
title and start fixing it. Double. And as you can see, missing fields because
for Tableau this is a new fields side by side. I'm going to go and add the sum of the current
year of sales. And then I'm going to go and
remove the missing fields. The same thing for
the second one. We're going to go and
add that differences. And remove the missing
field as well. We have to go and change
the coloring again from reds because
it was a warning. And let's add it as plaque
for the second one as well. All right, so let's go and hit. Okay, so now as you can see, everything is packed on neural
and we have again our pan. All right, so with that,
we have built our chart. And the next step is that
we're going to go and format it in order to make it a
beautiful chart, right? And this includes a
lot of stuff like removing the lines,
removing the grades, removing the headers, axis, adding coloring, simplify
everything, right? So let's start with the easy
stuff where we're going to go and remove those
grids and those lines. So rightallyhre on the
empty space, go to format. And then we're going
to go to the left side over here. Let's
go to the lines. Let's check the
zero lines to none. Let's go to the rows. Remove the grid as well. As you can see, we don't have any lines
here in the middle. Let's go to the grid over here. And let's go to the
sheets and start removing everything like
any line should be. None. With that, we are removing everything inside
our grid. All right. As you can see, we have
cleaned up all those lines inside our charts and
everything looks really clean. The next step with that,
we're going to go and work with the axis and headers. Let's go and remove
the axis over here. So right clicking it and
let's remove the header. Now we might ask why we are
removing a lot of stuffs. And that's because
in the dashboards, if you add a lot of aformations, you're going to
distract the users. And they will not focus
on the important stuff which is showing the
trends inside the view. So we have to reduce
a lot of information and only present the
relevant informations. So really here we have to be very minimalist in the design. So now what is left is
the months of over here. So rtically conducts. Let's go to the edit at we want to remove the
title from it, so let's go remove that as well. We're going to go
and indicate that those informations are months, rightly conduct and formats. Then let's go to the
dates of over here and let's have abbreviates. You can see now we have
abbreviations of each month. Let's go and clear this. So now the goal is to
show for the users. This park line is based on the months and we don't want to show all
those informations. So it's enough to
show only few values. So I would like now to show
only January and December. Remove all other information. So once you see it's
January and December, you will immediately understand this is based on the muscles. So what we're going to do,
we're going to go and edit the X again and change the X. Let's go to the tick marks over here and let's go to fixed. Now next we're going to
go and change the tick. So it's going to start from January and it's going to show the value of December after
the interval of 11 values. It can show the last month. As you can see now we
are showing January. And only December, and everything is between
is not shown. So that's it. Let's go
and close it as well. We have those nulls.
Let's go and remove them. So right click and
hide indicators. Now as you can see, we
have everything cleaned up and we have only
the line charts, and here we are indicating
that it's based on the month. Now what is left is
coloring of our charts. So as I said, I'm following
here only four colors. So here we have
our basic colors. But now let's go and
change those informations. So now we're going to do, we're going to go and
change the lines. Let's go to the lines over here and start working
on the coloring. It colors now. We'd like to have the current year of sales
to be very dark gray. And the previous year
going to be like in the background as light gray. In order to do that, let's go and double click on
the first value. So now what we're going
to do, we can add our colors instead the
custom colors over here. In order to configure
it only once and keep using it in
all other charts, let's start configuring
the colors. Let's click on the
first sale over here. So make sure you
are selecting it. Then let's make it as
something like here, a very dark gray. And then the next,
we're going to go and add to custom colors. So let's click on
that. So with that, as you can see, we have
defined the first color. And let's go and hit Okay. So with that, we have
defined the first color. Let's go to the
previous year sales and as well make a new color. So let's go to the seal
over here beneath it. And let's make it
something like here. It's going to be the light gray. And let's make it more
lighter. All right. Something like this. Let's
add to custom colors and hit. Okay? All right. So
now let's go and hit. Okay. And with that,
as you can see, the current year
is going to be the black one or the very dark gray. And in the background we have
the previous year of sales. So now next we're
going to go and change the coloring of
those two circles. So let's go to the Minimax
and the Marks over here. And let's grab the
minimax sales by holding control and
put it to the colors. All right, so now let's go
to colors in the colors. Now, instead of automatic, let's go and switch it to
custom over here, the last one. And then we're going to change the steps to only two steps. So now we're going to
start on the right color, where we're going to
define the max value. So let's go inside. And now we can define
our third color. So let's click on
Empty Sale over here. And let's add the code of our
third color, the turquoise. All right, then let's go and add to custom colors over here. So as you can see, we
have our third color. Let's click Okay. And now we have to define
the left color. It's going to be the mean value. So click on Arts, and we're going to
define our fourth color. Click on the empty
cell over here. Let's add the code
for the orange, and then let's go and
add it to custom colors. And with that,
we've got our four colors that we can use in all our charts inside
these projects that sits. Let's hit Ok. And hit
Ok. Now as you can see, we've got our two circles, the highest value, the mean
value, using our coloring. Now the last touch that I'm
going to add to this chart is to reduce the opacity
of those two circles. Let's go to the colors over
here and reduce it from 100 to something
like 70% that sits. All right, so now the next step after formatting our charts, what we're going to
do, we're going to go and work tool tip. If you mouse over
anywhere in the lines, you can see that
we have a tool tip and it's not really nice. As you can see, it looks like calculations and
not human readable. What you're going to
do now, we're going to go and edit those informations. Now in order to do that,
let's go to the tool tip over here in the marks and then we're going to get this box here. We can see in this window,
it's very similar like you are editing a title
or any text in Tableau. Here you have two
different types of text. The one that is not highlighted, this is going to be a static, and the one that is highlighted with this light gray background. It's going to come
from the charts. What we're going to do,
we're going to go and remove all those informations and
start creating our tool tip. Let's start with the first one, Sales, and then we're
going to have off. And then we're going to
go and add the month. We're going to go over
here into Inserts, and then let's insert
the month order dates. And here we're going to go
and add the current year. We can go and use, for example, the barometer for
the selected year, but we're going to have a
problem as we're going to show the sales of the
previous year for that. In order to show the years
inside the tool tab, we're going to go and create
some calculated fields. Let's just close this and we're going to go
back to it later. Now just check the tool table. As you can see, we
are going to get sales of March,
April, and so on. So we don't have a
lot of formations. But now let's go and
create calculated fields. Now we're going to call
it the current year, so it's going to
be really simple. It's going to be the value that the user selected
from the parameter. That's select year. That's it, okay? As you can see, we have the current
year on the database. Let's go and create another
one for the previous year. Previous year. And it's
going to be as well. Select year, but this time we're going to
subtract one year from it. So that's, let's go and hit. Okay. But now I would like
to go and change them to dimensions because
they are not measures. Right click on the current year and let's change
it to dimension, the same for the previous year. Let's go and convert both
of them to dimensions. All right, so now
we're going to go and grab all the information that we need in the tool tip to this box over
here to the tooltip. Well, the previous
year just drag and drop it on top
of this box year. Let's go and show
the informations about the current sales and the previous sales and
the differences between them. All right, so now we
have all the information that we need for the tool tip. Let's go inside the Tooltip
and start configuring it. Let's go over here now. After the month, what we can do, we're going to have a coma. And then let's mention the year. So it's going to be
the current year. This one over here. All right, after that, let's
have double points. Let's go and insert
the Current Sales. Insert. And now make sure to select the current
year of sales. This one over here. And not the fixed one.
So it's like fixed. But now we would like
to show in the tool tip the sales of
the current month. In order to do that, we're
going to go and select the sum of the current year for the
sales without any fixed. So let's go and select that. We're going to go and
do the same stuff now for the previous year. Sales of, we're going
to add again the month. So now we're going to go
and do the same stuff for the previous year. Sales of, we're going to
have again the month, so let's go and grab the month. Come on, and then
we're going to go and add the previous year, so it's going to
be this one over here previous year.
Double points. And then let's go, that gets the sales of the previous year. Okay, now the next information, The next line going to be
the sales differences. Let's say differences,
then douple points. And now let's go and add
that differences here. Again, make sure to not use the fixed one that we
have inside the title. Let's go and get
the variable one, the one that we
added from the data. Pain this one. All right, the last information that
we're going to show inside our tool tip is the
min max values. The highest lowest
sales, double points. Let's go and grab our measures. Going to be the Minmax sales.
Let's go and select that. All right, so that's all
the information that we want to add
inside our Tooltip. Let's go and hit Okay.
And check the results. For example, let's go to
the viewpoint over here. So now we can see
that the sales of the current year for
the month November, it at this value. And as well, it can
be compared for the sales of the previous
year for the same month. And then we can see
the sales differences and what is the highest
and lowest value. So now as you can see as we are moving to different months, the values inside the
tooltip going to change. So now, as you can see,
the format and the design of our tool tip is
naturally nice, right? So for example, we have
the thousands dots and as well everything bold. So it's not really
easy to read as well. The alignment of those
informations are naturally nice. So now we can go and format it. All right, so now
let's start first with formatting the current
and the previous year. Let's go to the current
year and let's have the default properties
and then format number, we're going to
have it as custom. Let's reduce the decimal
numbers as well. Remove include
thousand separator. All right, now let's
go and hit okay. And let's just test.
Now as you can see, 2023, don't have any dot. Let's go and do the same
for the previous year. Let's go to the
default properties and then number format as well. Let's go to the number custom, reduce the decimals, and
remove the south separator. Now the next one, what
we're going to do, we're going to go and adjust
the format of the numbers. As you can see, the
current month has different format than
the previous months. Now in order to do
that, let's go to the previous sales over
here, right click on it. And let's go again to the default properties
number format. And we're going to go again
to the number custom. Let's remove the decimals, The unit display, it's
going to be thousands. And we're going to
add that dollar sign. Let's go and add it. And then hit okay. Now let's check again. Now we can see now both of the numbers have the
same part format. Let's check the max and min. You can see the max and min has as well, the same problem. Let's go to the Minmax value as well to the default
properties number format. And then let's go to the
custom remove decimals, add the dollar sign, and don't forget
to add the unit, it's going to be a Southend. Let's go and hit.
Okay. All right. So now all our numbers has exactly the same format and
now what we're going to do, we can go and format the text. Let's go back to the
tool tip over here. All right, now we're going to go and work with two colors, the light and dark gray. Let's select the first part where we have a text,
we don't have a value. This is going to
get the light gray. Let's check this
value over here. Let's remove the bold
as well. All right. Now let's do the same
for all other stuff. We're going to select
the have the light grey. Remove the bolds. Well, for the next informations. All right. The next information. As you can see, they
have exactly the color that we need. They are bold. Make sure that everything has a dark gray and as
well as the bold. Everything so far is fine. Let's go ahead to K and test. Let's over over here. Now as you can see, it's really easy to
read where we have a different coloring for
the text and the value. All right, so now the last thing that we're
going to do inside the tool tip that we're going to change the alignment
of the numbers. As you can see,
all those numbers starts from different positions. Now let's go and
change the alignments. In order to do that, let's
go again to the tooltip. Now what we can do, we can
go and add a tab exactly after the double points and make sure there are
no white spaces. We're going to go over
here to the first one. Let's add a tab now. Let's go to the second one. I believe we have
here an empty space. Let's just remove
it and add a tab. All right, for the next one, I believe I have space. Let's remove it and add a tab. And for the last
one, the same thing. Remove the space and add a tab. The tab can go and automatically alignment for all those
numbers that sets, we have all the
taps, let's go and T. Okay, now let's go and test. So as you can see,
all the numbers start from the same position. Let's go to the point
over here as well. As you can see, everything
looks really nice. All right, so that we are done, and we added a very nice and readable tool tape
inside our charts. Let's do a quick
summary for the steps. First, we create our
calculated fields that
197. Tableau | #4 Step - Building Sales Dashboard: All right, so we're
going to start talking about building the dashboards. The first step that
we have to plan the structure and the
containers of our dashboard. All right, so let's start sketching the
container structure. The first one is as
usual going to be the main container and it's going to be a
vertical container. And then we're going to
start from top to bottom. So first we have like a
title and two buttons. So for that we can include a horizontal container where we have the title
and the buttons. Moving on, below that, we have the information of the QBs. So we have side by
side objects here. Again, we're going to go
and use another container, another horizontal container, in order to have all
those bi side by side. Then moving down below that, we have the charts rights. It's again two
charts side by side, and we will use a third
horizontal container for them. This is the main
object that we have inside the main
vertical container. But of course in
our dashboards we have as well a lot of filters. What we're going to do,
we're going to build a vertical container where we're going to put all the filters for the dashboards. But this container
going to be outside of the main vertical container and we will use the
floating options. This vertical container
going to be outside of the main container,
the vertical container. For that, we're going to
use the option of floating. And as well the ability
to hide it or show it. I would say we will
go with this plan, and of course it is. That means as we are
building the dashboard, sometimes we add like an extra container
to organize stuff. So we will not cover
everything in the plan 100% but we will
cover the main stuff. All right, so now
with that we have a plan for our dashboards. Let's go and implement
it in Tableau. All right, now let's go
and create a new dashboard and wig call it sales dashboard. So now the first step that I usually do is fixing the size. Let's go in the left
side to the size, change it from range
to fixed size, and then let's go to the width. I usually go with the 1,200 And for the heights
let's go for 800. Okay, so with that, we got enough white space
for our dashboards. And I usually start with
the main container. But since we have
container which is going to be hidden and
shown for the filters, I'm going to start
with that first. Now, in order to create
this vertical container, I have a quick way in
order to catch it. So what we're going
to do, we're going to take any worksheets. Let's, for example, go
with the QBI sales. Let's drag and drop
it to the middle. So as you can see, table can
go and automatically create a vertical container on the right side where it can
put everything inside it. The parameters, filters,
legends and so on. And this is the container that we can use for our filters. Now what we're going to do, we're going to go and convert it to a floating element
or floating container. In order to do that, hold shifts and then click
on this icon over here. And then just move it.
As you can see now it's like freed and let
drop it anywhere. Now let's just move
it here to the end. What we're going to go and
remove this chart because we have to go now and build
the main container. Let's go and just remove it. And as you can see, we still have a here
on the right side. Now what we can do,
we're going to go and color the container. So make sure to select
the container over here. Let's go to the layout. And then let's go to the
porter, make it a line. And then let's choose any color. For example, the
purple one as well. Let's go and put a
background for it, maybe the purple as well. That we can see that we
have here a container, floating container
on the right side. The next step, we're going
to go and give it a name. So we have a here in
the item hierarchy. Let's go to the
vertical container. Click on it, and then let's
give it the name of Filter. Filter. All right, now we
have our first container. Let's go back and building the main container
for the dashboards. So let's go back to
the dashboards and let's grab a vertical
container for the main one. So let's draw it
here in the middle. And now we're going to go
and add the coloring for it. So let's go to the layouts. Let's go to the borders, and let's have it as
an orange as well. I would like to add a
background color for that. So let's take the orange as well that we have our main
container on the left side, you can see we have the tilts and then the vertical container. Let's go and rename it. I'm just going to make
it a little over here, so we're going to say you
are the main container. All right, so now
the next spit that we're going to go
and add planks in order to have a placeholder for the elements inside
this container. Let's just go and add one. And then let's go with the first container inside the main one. We have the horizontal
container for the title. Let's take a
horizontal container. Just drag and drop
it here, below. Make sure that is inside
the main container. Do that carefully. All right, so we have our
horizontal container. Let's go and put
some coloring on it. Lay out border, Let's make it blue as
well for the background. Let's have it as well
as blue, of course. Let's go and check
stuff over here. We have the vertical container, we have our plank on top. Then we have the
horizontal container. Let's go and rename it. You are the container
for the title. All right, now let's go inside
it and put some contents. So what we have, we have a text, so let's track and drop it inside the
horizontal container. So let's say you are
the sales dashboard. We will format everything later. That's it, let's go and it. Okay. Now as you can see, our container can be very small. Let's make it a
little bit bigger. And now we have to go
and add the two buttons. Let's go with the naviications. Make sure to add it
inside to the right side. Right, because it is
horizontal container, let's go and drop it. And we need another one. Let's go and drop it as well, to the right side or in the
middle. Doesn't matter. Right now, let's go quickly
and check the layout to make sure that everything
is fine Inside the title, we have a text and then
two buttons, grades. Now let's go to
the next content. We're going to have another
container for the key. Let's go again to the
dashboards and take Horizontal Container
and make sure to put it beneath the
first container. Let's rub it over here. And now make sure to click it. And let's go and add
the coloring to it. So it's going to be
line as we'll p, the background is
going to be as well. Plu. All right, so now
the next step we're going to go and add
again a name for it. So let's go inside. You are the container
for the keys. Okay, now let's go and add some content inside
it using the planks. So the first plank,
make sure to drop it. Second horizontal container, and now we have it very small, let's go and extend it. Then let's grab another one. Make sure to put it on the right side now that
we have two planks. And let's go and grab the
third one to the right side that we have our three
place orders for the KPIs. Again, I always go back to the layout to check that
everything is fine. As you can see, those
three planks are inside the QBI,
everything is clean. Let's go back now to
the dashboard and add the last container
for the charts. So we're going to go and grab again, a horizontal container. Drop it below the middle one. Let's go and add
some coloring to it. So let's go to the layout. We add some border blue and as well a
background for that. Now let's go and give it a name. You are the container
for the charts. Okay, Now let's go
and add some planks in order to have some
content inside it. So the first plank inside it, and now we have it very small, so let's extend it and the second plank to the right sides. Now we have two places
for our charts. Let's go to the
layout and check. As you can see we have the two planks underneath the charts. All right, with that, we have the three containers
for our content. Let's go and remove
the first plank. Since we don't need it anymore, we have it over here. Let's go and draw it with us. We have built the foundation, the structure of our dashboard. So we have the container
for the title. We have the three KPIs and then place for the two
charts as well. We have here on the right side our floating container
for the filters. All right, so as you can
see, it's really easy. Just do it slowly, step by
step, check everything. Give it a name. Don't rush it. All right, so that's
all for this step. Now finally, let's go to the step where
we're going to put everything together and put the content inside
our dashboard. Okay, so now let's go and put all our content inside
our dashboards. Don't worry about the filters. We're going to do it at the end. So let's start with
the KPIs, right? So, we're going to
take the first one, the KPI of sales. Make sure to put it
near the planks. And then let's go and grab
the second one next to it, and the quantity as
well next to it. So let's go to the layout
to check everything. So as you can see, we have
this container for the KPIs, and inside it we
have our three KPIs. Now we don't need
anymore of the planks, let's go and start
deleting them. All right, so now
let's keep going and put the other charts
inside our dashboards. Let's take the subcategory, make sure to be inside the
third horizontal container, so let's drop it over here. And then the last chart is
going to be the Weekly Trends. Let's drop it side
by side over here. So let's go to the layouts
and check so that you can see the horizontal container
for the charts has our two charts
and the two planks. Let's go and remove the planks. Great. Now you can check
again our structure in the item hierarchy to see that everything should
be looking like this. We have the main container, where we have inside it
three horizontal containers. The title should have the
title and the two buttons. And then the KPI should
have the three KPIs. The chart should,
has the two charts. If you have it like this,
that means everything so far is clean and
we are in a good way. All right guys, that's
it for this step. We have the main content inside our dashboard and it
was very easy and fast. Now in the next step,
things going to get interesting where we
can start formatting, coloring, positioning the stuff in order to have a clean
and professional dashboard. Okay, now let's start
formatting our dashboard. The first step that we're
going to go and make sure that our content is distributed
evenly in each container. Let's go to the KPI container over here. Make
sure to select it. And let's go to the small arrow. And let's click on
Distribute Contents Evenly. All right, so let's
move to the next one. As you can see, those two charts are not distributed evenly. Let's select the
container and let's go to the more options and
distributed evenly. With that, we're going to get a fair alignment for all charts. We will not do that for the
first container because the title should be bigger
than the unification patterns. Let's start from top to bottom. Let's start with the title. Let's go inside the title over here and start
formatting it. So we're going to call
it Sales dashboards. And then let's have a pipeline. And then let's have the year, the current year that
the user selects. What we're going to do, we're
going to go to Inserts. And let's add our parameter. Now let's go and change
the front sides. Let's select everything and
make it, for example, 24. Now let's go and
change the coloring. So let's go to the colors and
pick our coloring, right? So let's go and pick the
dark one for the year. Let's have it as Tableau medium. And pick the other
color that recuse. All right, so we have our title. Let's hit. Okay. And
check how it looks like. Yeah, I think it looks fine. Let's make it a
little bit smaller. That's all for those
two containers. Now let's go and
check the patterns. We have to make sure
that those patterns has exactly the same sizing, which is really
hard to configure. So what we're going to do,
we're going to go and grab a mini horizontal
container in order to put those two pattoms inside
it and distribute it evenly that we're going
to get a perfect sizing. Let's go to the dashboards and let's get a
horizontal container. Make sure to drop it to the right sides that we
have a small container, let's make it a little
bit bigger to see it. I'm just going to
remove stuff now. We're going to go and move
those patterns inside it. Let's drop it inside it. We'll pick the second one and
put it to the right sides. Of course, let's go quickly and check that everything is fine. Now, let me close
all those stuff. We are the title, we have our title, and then we have the mini
horizontal container. Inside it, we have the two
patterns. All right, great. Now let's go and make
everything distributed evenly. Let's go to the
horizontal container. Let me just quickly
give it a name. You are the horizontal
container for the patterns. Okay, perfect. And let's go and distribute this
container evenly. So make sure to select
the horizontal container. Let's go to the Options and
distribute content Evinlyow. As you can see, those
two buttons going to get exactly the same size as I'm reducing or
making it bigger, both of them going to get
exactly the same size. Let's just make it a
little bit smaller. Now let's go and change the
design of those buttons. So click on the first one. Let's edit the button. Okay. Now let's say the first button going to be for the
sales dashboards, so let's go and select it. It's going to be the
Sales Dashboards. Now let's go and give
it a title or a name. It's going to be
Sales dashboards. Now let's go and
format the fonts. It's going to be white,
so everything is fine. Let's go to the background. Let's pick our colors. So let's go to more colors and pick our pluekey. What else? Let's go again to the fonts
and make it instead of 12, let's make it ten. All right, so that's
it. Let's go and hit. Okay. Now with that, we have
configured the first button, let's go to the second one. Let's go and hit the button. Now, since we still don't
have this customer dashboard, we cannot go and select it. But still I want to format it. Let's go to the font, make it ten, and this time
I'm going to make it plaque. And let's give it a title. Going to be the Customer
Dashboard For the background, it's going to be the white, and let's go and add a border for it so
it can be the line, something like this
maybe and then gray. Okay. Now let's add a toll tip. It's going to go to
custom dashboard. Okay. Let's check that. Okay. As you can see we got the second button gray because we haven't
select any dashboard. So once we have a dashboard,
it's going to be white. Now let's go and make
it a little bit bigger. Select the container,
just make it a little bit bigger.
Okay, that's it. We will visit it later once we have the customer dashboard. All right, so
that's all for now. For the first container,
what I'm going to do, I'm just going to go and remove the background coloring
of the container. Let's select the title. Let's remove the border, and as well the
background color. Let's have it as
none. All right. Now, let's move to the next one. We have our QBs. The first
thing that I'm going to do, I'm just going to make
it a little bit bigger, maybe somewhere like this. Then what we can
do, we're going to go and add the background color. So as you can see, we
have here white color. But here we don't have any
coloring for the title. In order to do that, let's
click on each one of them, and then we go to
the background, let's make it white. Then to the next one,
and the third one, it's going to be as well white. Okay, so now we have
like a big card or QBI for all those informations,
for each one of them. All right, so now the next step that we're going to go and remove the coloring
of this container. So let's remove the porter and remove as well
the background. All right, now let's start with the first
container over here. What I'm going to do, I
will just as well add a background color for those two charts,
going to be the white. Now, what to configure
those stuff? We still have this container which is really bothering me. Let's go and select
the whole container. Let's move it to
the top over here. And then let's go
to more options. And we're going to
select this one. Add Show Hidden button.
Let's click on that. Once you do that, you
will get like small icon in order to show and hide
the whole container. What we're going to do,
we're going to hide it. Click again on the
Options and hide it. Now the whole container
is inside this icon. I will just place it over here in order to work on our charts. All right, so now the
next up that I would like to go in each
chart and make sure that it fits
the entire view. Let's go to the first one. You can check it
from here, you can see it is entire view. The next one as well, third one and as you can
see it's standards. So let's go and switch
it to entire view. And the same thing for
the weekly trends, it is entire view with us. We make sure that Tableau is using the whole space
and we can make this one a little bit bigger as well because we'll still
have a little bit space. So let's go to the middle
over here and make the KiByes little bit bigger in order to use the
whole white space. All right, so with
that we have a perfect positioning
for each chart. I'm really happy with
that. All right, so now the next step that
we're going to go and add some nice legends to our charts. Now for the first charts, we have to give the following
information for the users. So the dark gray going to be the current year and the background color
is the previous year. So now I'm going to
go and customize. Legend. I will not
use the one that's from Tableau because I
want to customize it. So for that we're
going to go and create quickly a chart for the legend. Let's create a new sheet and all what we need is the text
of the current year. And the previous year, we
have it as calculated field. Let's move the current
year to the text. And as well the previous
year to the text. Now let's go and customize
those informations. Okay, now we're going to
start on the left side, so let's make the
alignment to the left. I'm going to start
with the first information, the current year, we're going to say the
current year sales, let's make the bigger, and let's go and
change the funds to something like maybe
a medium as well. The coloring, it should follow
the pattern, the chart. The current year of sales, It was a dark one. Let's go and pick our dark
color for the previous year, it was the light
color. Let's do that. Let's make the
current year as bold. Okay, let's go and test it. Let's go and apply
now public to show it as hashes because the
size is really small. So let's go and hit Okay. And we can go to the
standards and make it entire view. Now we
can see it over here. 2023 sales versus 2022 sales. Now as you can see
it, the current year versus the previous year. Okay, one thing that I'm
naturally happy about it, let's go inside it
and remove the bold. Okay, let's give it a name. So this can be the legend
category charts that sits. Now let's go to the back to the dashboard in order to use it. Now I would like to
have the legend between the title and the chart.
We cannot do that. Instead of that, we're
going to go and make an extra container for
those three informations. We have a legend and
then the charts. As I said, again, we cannot
plan everything at the start. As you are building
the dashboard, you will understand the needs
and you will adjust stuff. Now what we can do, instead
of having this chart, we're going to have
a vertical container inside the horizontal container. Now let's grab a
vertical container. And the bit thing to do,
it here in the middle. And what we can do,
we grab the chart, put it inside this container, so make sure to drop it
inside this container. And of course, let's
go quickly and check the layout where
everything is fine, it's inside the
tilted main charts. Now, instead of
the first charts, we have a vertical container. Let's go and give
it a name quickly. You are the container
of let's say chart one. Inside it, you can see
we have our charts now, our vertical container going
to start with a title. Let's go and grab a
title or a text on top. And now we're going
to give it the name, Sales and Profits
by subcategory. Now let's go and format. You're going to be
table medium as a font. And then the size going to be 14 and the color in dark one. So let's go and select
that. Okay, so that's it. Okay. All right, so
that means we don't need the title of
our chart, right? Click on it and hide the title. Great, so now finally we can
go and grab the legends. But now in this chart,
I would like to have as well a legend on the
right side for the profit. So that means we
have a legend on the left and legend
on the right. And in order to do that, we're going to have another container. In order to put those two
legends side by side. We cannot do it
currently because we have a vertical container. So let's go and grab a horizontal container and just put it in the
middle of over here. Just resize it
makes you to select the container and let's put
the first legends inside it. Okay, so now we have a
title for the small legend. Let's go hide it. Great. So now let's go and
make everything smaller. All right, so with that, we
have really nice legends where we are telling the users, we are comparing the
sales of 2023 with 2022. All right, so now let's go and configure the right legend. We have to tell
the users, this is profit informations and the blue color
indicate for profits. The orange can
indicate for loss. For this legend, I'm just
going to use that text object. So let's drag the text
and make sure to put it inside this mini
container to the right side. So first let's indicate
the current year. Let's go to inserts and
have the parameter, because here we have the profit only for the current year. Next we're going to say, okay, a circle, this is
going to be profits. And another circle, this
is going to be a loss. Okay? Now let's go and
make sure that the font is a Tableau medium,
it's going to be a nine. And let's go and make sure that the coloring that is
used is the dark one. But now let's go and change
the coloring of the circles. So the first one going to be the blue and the loss is orange. Our orange. Okay. So now let's go and it okay,
and test it. All right. So now as you can see we
have, it's really big. Let's go and make it
smaller. All right. So with this legend,
the users can see immediately that
we are talking about 20:23 The blue one can be the profits and the
losses can be the orange. All right, I'm really happy
with the first chart. Of course we still have the
coloring of the background. Let's go to the layout and
make sure that everything is correct of the containers.
Let's go to the chart. One, as you can see, we
have a vertical container, we have a text, and then we have a horizontal container
for both of the legends. Inside it, you can see
we have the charts for the first legends and
the text of the second. Then below that we
have our charts. If you have it like this, you
are following me correctly. Now what we're going to do,
we're going to go and give a background color for the whole container for
the first charts. Let's go to the background over here and make it as a white. With that, the users going
to get the feeling that everything is in one
unit, in one charts. All right, so this is
for the first chart. Let's go and do the same
stuff for the right one. In order to do that,
let's go and grab. Container. Let's grab it
to the middle over here. So now with that, we
have our container. Let's go and grab our chart
and put it in the container, the new one that
you have created. So now that we have our chart
inside the new container, let's go and check
the layout to make sure that everything is fine.
Let's go to the charts. We have chart one, and the new one can be
for the chart two. Let's go and rename it. You are the container
for chart two. Okay, inside it we have
our chart, so perfect. So that means we're going to
go and grab a text objects and drop it on top of our chart
inside the new container. Let's call it Sales and
Profits, Trends Time. Now we're going to go
and start formatting it. Let's go and grab the
Tableau medium as well. Going to be 14. Let's
go and pick our color. It's going to be the dark
one that we're going to get exactly the same title as the left one. Okay,
the next tip. Let's go and hide the old
title from the charts. Next we're going to go and put our legends to be,
it takes objects. Let's put it in the middle between the title
and the charts. We're going to say
in the legends. Let's enter a parameter in
order to show the year. And after that we're
going to have a circle. And we're going to say
this is the above. And another one it's
going to be below. Now with that, we're going to indicate whether the line is above the average or
the below the average. We are using the coloring.
The above can the blue one. Let's go and choose us. And below can the orange,
our orange color. Now what you can do, we
can make sure that we are following the same font. So it's going to be
the Tableau medium, and it is a nine. All right, so that's
all. Let's go and hit. Okay, I think we missed out
the coloring of the 2023. Let's go inside it
and make sure to choose the dark color for it. All right, let's hit Okay. So now we've got a
quick explanation about the coloring inside our
chart on the right side. Now what we're going to
do, we're going to go and select the whole container. And we're going to change the background color to white in order to have this one unit
feeling in the charts. So let's go to Layout, and let's go to the background and choose the white color. All right, so that
we are done with the container of charts
and what we can do, we're going to go and
select the whole container. And remove the border and as
well the background color. Okay, so now by looking to our charts inside
our dashboards, we still are missing some
information about the Kpyes. We have to present here
legends explaining those two points and as well the coloring
of those two lines. So we will have
something very similar to the legends where
we're going to say 2023 versus 2022 in order
to explain those two lines, and then we can explain
those two circles. In order to create the legends,
what we're going to do, we're going to go to the
legend of Subcategory. And let's go and duplicate it. Let's give it a name you
can ape the legend of BI. Let's just move the dashboard to the end in order to have all
the sheets on the left side. Let's go to the legend of
BI and start formatting it. Now, since we have different
KPIs, not only the sales, I'm going to go and remove
the saleswords in our text. Let's go to the text,
to the three points. And then let's go and
remove the sales. And let's have only the years. And then let's go
and add our circle. And we're going to
say highest month. And another circle
for the lowest month. Now as usual, we're
going to go and start formatting
those informations. It's going to be low, medium, and nine, so
everything is fine. Let's go and change the
color of those circles. The highest going to be the blue and the lowest going
to be the orange. Let's go and hit Okay. And check the results.
Looks nice, right? But I think here I
have an extra space. Let's go to the text again. Let's have only one
space. All right. Let's go and hit.
Okay. Now let's go and use it inside
our dashboards. So what are we going
to do? We're going to go to the
dashboard over here. Let's grab the QBI,
the legend KPI. And let's drop it
just below the title. We can have it between
two zonal containers. Let's drop it first. Next time we're going to go and
remove the title. So let's go and
hide it. Now, it's really small between
those two containers. What I'm going to do
in order to select it, let's go to the item hierarchy. And now we can check and see we have the container
for the title, the container for the KPIs, and in the middle
we have our charts. All right, now maybe let's
go and make the title just a little bit
smaller like this. Let's go to the legend BI, drag it a little bit below. All right, so now it
looks fine and we have an explanation for
the three KPIs. All right, so with that,
we have everything ready inside our main container. What is missing, of course, is the hidden container
where we have the filters. But I will leave
that until the end. Now what we're going to
do, we're going to go to the main container,
it's selected, and remove the border and as well the background.
So let's have none. All right, so now
the final touch, the last step of formatting
these dashboards. We're going to go and add spaces in this dashboard
between the charts. Adding spaces between
the charts going to have a huge effect on the user
experience for your dashboards. And as you can see,
those two charts are really near to each others, like they are not able
to breathe, right? So adding space between
those two charts will not only add a balance
between the items, but also it's going
to make it easier to read for the users.
So now let's go and. Those stuff. The first thing that we're going to do
is that we're going to change the background color
of the whole dashboard. In order to do that, let's go to the main menu over
here to the dashboard. And then let's go to
the format option here. The default going to be white. Let's go and move it to the lightest gray.
Let's select that. Now with that, we are separating the charts from the background, and we can see immediately the spacing between the charts. Now if you look to
the three KPIs, you can see we have a
minimum space between them. But between those two charts, there is no space at all. Now let's go and fix the
spacing from top to bottom. First, I would like to
have the background color of this legend to be a gray. In order to do that,
let's go to the sheets. So I'm just going to switch
to, let's go to the format. But if you don't
have it open, just right click on that white space. Go to format, and
let's go to shading. So now we can go and color the background of
the worksheets. So let's go and say none. All right, so now let's
go back to our dashboard. And as you can see for
the legend over here, we don't have a coloring. We need a background color of
white only for the charts. All right, so now
let's start working on those three KBIs in order to increase the spaces
between them. In order to do that, let's
go and select the first one. Let's close the formats and
let's stay at the layout. Now here, if you go
to those two options, we have the outer padding
and the inner padding. The outer is the space
between the objects and the inner is the space
inside the chart itself. So now what do we need? We need to increase the spacing between those three KPIs and as well the spacing between the
KPI and the charts. All right, so now let's
go and start with the outer budding.
Let's click Connect. Now here as you are
increasing the numbers, as you can see the budding, the spaces between
this chart and the neighbor charts
can be increased. And as you can see, it's
going to increase for top right, bottom left. So as you can see, everything
is connected together. If you change
something here, it's going to change for all values, and that's because all
sides should be equal. And here, it's very important to understand that you
have to make a decision about the spacing between
your charts and you have to commit to your decision
for the whole dashboard. This is really
important, otherwise the dashboard going to be ugly. So now we're going
to go with the value 20 for all the
charts inside this. Now let me show you
how we can do that. Let's go and make
everything to ten. Now what we are
doing this chart is taking a ten on the
left right top button, and our goal is to have a 20. If this chart on the right
side is taking a ten and the neighbor QBI is taking from the left
side as well ten, then we will have a 20. That means in order to have
a 20 between all our charts, each one of them
should, has a ten. But now I care only
for the spaces between the charts and
not the legend over here. What we can do, we're going to go to the outer
batting over here. And then let's
remove all sides are equal and from the top,
I really don't care. Let's make it as a zero. Our chart is not taking
any spaces to the top, taking only space to the
right, bottom, and left. Now let's go and do exactly
the same for each KPI. Let's go to the profits,
go to the padding. We have to have it here as ten. Now let's go and disable
all sides equals, and we don't need any
spaces to the top. All right, so let's
move to the next one. The same stuff make it ten, and let's remove the top. Now we can see clearly
there is a space between all those three KPIs and
this space is equal to 20. Now let's go and add spaces
to the two charts over here. So make sure to select the whole container.
Now the same thing. We're going to go to
the padding over here, and now we're going
to make it a ten. This time we care about the
top to be ten in order to have a 20 between these
charts and the QBI above. All right, so that's
all for this charts. Let's go to the next
one and do the same. So make sure to select
the whole container and let's move it to ten. Alright, perfect,
let's go and deselect. So as you can see, the
whole look and feeling of our dashboard look more
professional and easier to read. And this is exactly why we add spacing between our charts. Okay guys, now not
only the spacing between the chart is important, but as well the inner spacing, the inner padding is
important between the content and the border
of the content as well. Adding spacing inside
the container or the contents going to make things look more
bitter, for example. Let's go to this KPI over here, you can see the total of sales is very close to the
border right now. We go to do, we're going to
go to the inner budding. Now let's go and increase the size a little bit and
see how things look like. Let's make it maybe seven. Now as you can see, as I'm
increasing those numbers, the content are getting pressed and move away
from the border. If you increase it, for
example, like to 20. And as you can see now
we have a lot of spaces between the title and the
border of the content. Now let's go and
move it to seven. We will go and do the
same for all other KPIs. Let's go to the right one and we're going
to make it seven. And to the third one. Let's
go and make it seven. So as you can see,
moving the content away from the border
a little bit, going to make everything
breathe better. Let's go and do the same
for all other charts. So I'm going to go over here
to the whole container. Let's add seven as well
over here and add a seven. All right, so that's
all. With that, we are done formatting
our dashboard. The next step of that,
we're going to go and start working on the filters
and the interactivity. Now let's check quickly what was the requirements We have
to allow the users to filter the data by the
product informations like category and subcategory, and as well by the
location informations. Like the region,
states and city. And we have another requirement about interactivity
and filtering. It says we have to
allow the users to use the chart and the
visuals as a filter. All right, now let's go and
add the requested filters. We didn't add any filters
inside our worksheets, so let's go to any
of those worksheets. For example, the QBI sales, and let's start
adding the filters. So the first one is what's about the products informations. So let's go and get the
category show filter. Then let's go to the
location information. Let's add the country. All right, so those
are the filters that are requested
from the users. The next step that
we're going to go and apply them for all worksheets. So since all those filters are relevant for all our charts, so let's go to the
first one, Radic, click on it and
apply to worksheets. And then let's say all using this data source, let's
go and select that. And as yo
198. Tableau | #5 Step - Building Customer Dashboard: All right, so now I hope you are done building the
customer dashboard. Now I'm going to
show you my version how I did implement it. So now let's have a quick
overview on the requirements. Let's start with the key
requirements we have here, the same stuff it says that
we have to show KPI's, where the QBI should display the total
number of customers, salesper customers, and
as well the total number of orders for the current
year and the previous year. And the next requirement
is about the trend. We have to present the data on a monthly basis where we have to compare the current
and previous years, and that's where we
have to identify or to highlight the highest
and lowest values. So those two requirements are exactly like the
sales requirements, but with different measures. So for the chart type
here, we're going to go exactly like the sales
dashboards where we can have bands and as well spark lines with small circles. All right, moving on to
the third requirement, we have the customer distribution
by number of orders. So here we have to present
the distribution of customers based on
the number of orders. So here we are talking
about data distribution, and for that we have
a perfect chart. We have the histogram. Okay, so now for the
last requirement, we have to show the top
ten customers by profit. So here we have to show
the top ten customers with the highest profit as well. They need a lot of
information like the rank, number of orders, current sales, current profits, and
the last order dates. In this requirement,
we have to present a lot of details about
the tamed customers. And for this, I have
decided to go with a symbol table where we
can have rows and columns. All right, so this
is about analyzing the requirements and
deciding on the chart type. For the next step,
we're going to talk about the mock up
and the coloring. We're going to use
exactly the same stuff like in the sales dashboard. And that's because
the two dashboards are in the same
projects and it makes no sense to create each time for a new dashboard,
a new mockup. So here we have to
follow one mock up for all our
dashboards in order to have the same
look and feeling of our dashboards
inside this project. As you can see, things goes easier for the next dashboards. Now we can go and start implementing the charts in
Tableau. All right, Sona. For the first charts we
have the three QBIs, Customers, Salesper
Customers, and orders. They are the usual
stuff like before, It's just copy paste and
switching the measures. Of course, if you are interested
in how I implement it, I'm going to leave the file as well on the projects or you can go to my public profile and
download it from there. Maybe one interesting
thing to show you, how did I calculate the
salesper customers? So let's go over here. Since now we have
a lot to filter, we can go and
search for customer in order to check the
calculated fields. So first we have to decide
which customers did order for the current
year and which one did order for the previous year. So it's really simple
if we go over here to the current year customers
and let's go and edit. You can see over here we
have the same condition. If the year is equal to selected
year from the parameter, then show the customer ID, otherwise it's null
with the previous year. We're going to have
exactly the same pat, subtracting one year. So this is the first step. Then the next step, we're
going to go and calculate the current year
sales per customer. We have it over here. Let's
go and check inside it. For that, we have the
following calculation. We can divide the current
year for the sales by the count of the distinct
value of the customers. And with that, you're
going to get the average sales peer customer. So we will do the same stuff as well for the previous year. And there is going
to be, as usual, so finding the differences and finding the min, max values. So that's it for the
sales per customers. Now let's go and start
implementing the first chart using the histogram in order to show the data distributions
for the customers. So let's go and create
a new sheet and we can call it
customer distribution. All right, so now since we are talking about two measures, the count of customers
and the count of orders, we have to go and use
the LOD expressions in order to generate the pens. And I explained
that in details in the LOD expressions
using the fixed. So make sure to check
that in order to understand the LOD expression that we're going to use now. And for that we're going to
go and convert the number of orders into pens
using calculated field. In order to do that,
let's go and create, let me just remove the search, create a new calculated field. So here we want to find
for each customers how many orders they placed, and of course we are talking
for the current year. For that we're
going to go and use the function fixed from
the LOD expressions. Then we have to
define the dimension. It going to be the current
year for the customers. So here we have
all the customers that did order in
the current year. Then after that we have
to do the aggregation. And it can be the
number of orders. So we're going to go and
count distinct as well. The current year for the orders. The current year for the
orders is like the customers, all the orders that are
placed in this year. All right, so that's all. Let's go and close the
fixed over here. All right? So again, what we
are doing over here, for each customers
we are going to find the number of orders that are placed for the current year. All right, so now let's
go and hit, okay. And now we have it over
here as continuous measure. Let's go and change
it to a dimension. So right click on it
and make it a dimension because pins in the histograms are usually discrete values. So now what we're going to do, we're going to go
and test the values. Let's drag and drop
it to the view. Okay, so we got our
pen for the histogram, but I would like and go
and test those data. In order to do that, let's
go and create a new sheet, let's call it test histogram. So what we can do, we're going to go and check our customers. Pick the customer name. And now as well, let's go and grab the order ID over here. Let's show all the
values as well. We need the date, so let's
go and pick the order date. It is over here in
order to see the year. And then what we're going to
do, we're going to go and check our new calculated field. Let's drop it over here. Then let's go and
switch to a measure. And all right, I will go
and drop it on the labels. All right, so now let's go and check one of those customers. Let's focus on Adam
heart radically. Let's say keep only now we can go and check all orders of Adam. And as you can see,
we have a lot of orders in the history and none of them can be counted
inside our calculated field, because we are focusing
only on the current year. As you can see, we start
counting from 2023. And in 2023 we have
five orders, 12345. As you can see, the measure
is returning a correct value. We can go and test
the other years. For example, let's go
and show the parameter. Let's go and switch to 2022. That you can see in the 2022
we have only three orders. Let's go and switch it to 2021. And we have here only one order. So that means our
calculated field is working as attendance and we can use
it now for the histogram. So this is what I
usually do once I create a new calculated field, especially if it is
LOD, I go and test it. So I create a simple table in order to see the data
and focus, for example, on this one customer instead of testing directly
in the histogram, because it's really hard
individuals to test the data. All right, now let's go back to our customer distribution
and let's get our bars. In order to do that, we're going to go over here to the rows. Let's say count distinct. And now we're going
to go and count the customers for
the current year. So the current year customers, we have to go and change
the visual to pars, since histograms are bars. And what do we got our
histogram that says, now next we're going
to go and start formatting our histogram. So the first thing, as usual, we're going to go and
remove the lines. So let's go and format. Let's go to lines, let's go
to rows and remove the grid. All right, that's
all for the lines. Next we're going to go over
here and remove the headers. Let's make those pins and
make it more readable. So let's go in formats. Maybe I'm going to make it
bold and change the color. All right, so now we have the name of the
dimension over here. We can go and hide it. Okay, now let's go and
start with the coloring. Let's hold control and drag
the customer to the colors. Of course, we're going to
go and use our coloring. Let's go and edit it. Let's pick one. All right, so
that's it. That's it. Okay. Okay. Next we can go and add some borders to those parts. So let's go to the colors to the borders and make it
something like this. All right, So now
the next time I'm going to go and add some labels. So let's get the
customers to the labels. And I think with that you
are done with the hat gram. We can go and test it by
adding the parameter. Let's select another
year like 2023, and as you can see,
everything is reacting. And that's it for
this requirement. Now we are showing for the
users the distribution of customers by the
number of orders. Let's go now for the next requirement
where we're going to show the top ten
customers by the profit. All right, now let's go and
create a new worksheet. Let's call it Top Customers. So now we need our customers to the rows and now we're
going to show only the top buy, the profit for
the current year. Let's go and get our measure. It is the current
year for the profit. Let's drop it on
the text over here. Now next we're going to
go and make the filter in order to show only the
top ten customers. Hold control, drag and drop the customer name
to the filters. And now here we're going
to go to the tab off top. And then let's switch
it to buy field, so we have top ten
by the profits, and the aggregation
going to be the sum. So this is exactly
what do we need? Let's go and hit, okay. And with that, we're going
to get a very simple list of the top ten customers
by the profit. Let's go and change the format in order to see
the whole number. So let's go and formats
where I'm going to go and remove the unit,
remove decimals. Let's have the dollar
sign at the stars. Alright, so now we can
see the whole number. Let's go and sort the
list by the profit. So in order to do that,
go to the customer name, Then let's go to sort and
we're going to go to a field. In order to have a
ranking, we're going to switch to sort order by descending and make sure that we have the field name,
current year of profit. All right, so that's all. Let's close it and
as you can see, the first customer on top, it's going to be
the top customer. And now the next step
that we're going to go and add the
rank to this list. In order to do that we're going to use it, the function index. Let's go to the roads
over here and just write index And that's it. And then let's go
and switch it to discrete and just
put it at the front. And with that we have a
ranking 1-10 All right, so now we're going to go and add additional information for each customer's like the sales
for the current year. So let's go to our data pin and let's grab the
current year for sales. Drag and drop it on
top of those numbers so that we can see as well the sales for
the current year. Let's just make it a
little bit bigger. And now the next information
that we're going to go and add is the number of orders for the current year that is placed from
the customers. In order to do that, let's go to the Measure Value over
here and double click on the empty space and write down count distinct in order
to count the orders. So we're going to go and type current year off the orders.
All right, so let's sit. Okay, and now we're going to see the number of orders that each customer did place in
the current year. All right. So now the next information
that we're going to add is the last order date did
the customer place. And now we need the
last order date. In order to do that,
right click on it and let's go to the measures
and get the maximum. So with that we can see, now when was it the last time, did Customer order from our
business. All right. So with that, we've got
all the information that we need inside our chart. The next step that we're going to go and start formatting it. First we're going to
start with the lines and the grids as usual. So right click on it
and go to format. Now I would like to get
rid of this line in the middle between the
measures and the dimensions. So let's go to the grids. And let's go as well to the
column divider and remove it. With that, we don't have
the line in between. Now the next step
we're going to go and get rid of the gray
background color. Let's go to the shading, And then here we're going to
go to the row bonding and reduce the size to the
minimum that as you can see, the background color
did disappear. All right, so that's all
for the lines and the grid. Let's go and start formatting the ponts and the
colors of our phon. First, I would like to
format the index over here. Let's go to it.
Format. Let's go and make sure that you are
selecting the correct field. Yeah, we are selecting
it. Let's go to Pan. Now, let's go to the
numbers over here. And I would like to add a,
let's remove the decimals by the number of custom
and add the prefix of hash in order to have a ranking. That's it. What
else we can add to this ranking is that we can go and add the background
color for it. Go to the shading over here
and make it very light gray. All right, that's
all for the ranking. Let's go to the
next one and start changing the font color format. Let's go to the font,
so we can leave it as a Tableau Po and we can go and change the color
to something like black. That's it. Let's go
to the next one, format, and we're going to
go over here, make it plack. All right, so I'm moving
on to the measures. Let's go and remove the
unit from the sales. So let's go to the
sales over here for mats and then we're
going to go and format it as usual to the
number custom remove the decimal and add $1 sign. All right, and for
the number of orders, we're going to
leave it as it is. All right, so that's it. Let's
just keep it very simple. And with that we have a
really nice detailed table to show the top ten customers
with additional information. All right, so with that we are done building all the charts. The next step we're
going to go and start building the
dashboard. Okay? So now in order to create
the customer dashboard, we will not create
everything from the scratch. We're going to go and
duplicate sales dashboard. In order to have the structure, Let's go to the
sales dashboards. Radically connect and duplicate. With that, we've got two
identical dashboards. Let's go to the second one
and start formatting it. First we're going to
start with the naming. So it's going to be the
Customer dashboard. Now let's start
from top to bottom. We're going to start
with that title. Let's go over here,
change it from sales dashboard to
customer dashboards. As Cain, creating the
second dashboard can be very easy once you have a
really solid structure. All right, so now next what we have, we have
the three charts. We're going to go and replace
them all with the new ones. The first one is going
to be the QBI customer, let's just drop it at the start. Of course, going to go and start adding stuff to
our new container. Don't worry about it. We're going to go
and delete it later. Let's go and get the
next KPI, Salesper, customers and the
orders, okay? All right. Now let's go and
hide this container. So right click on the icon
and let's go and hide it. All right, so now
we can go and drop those old BI's from
these dashboards. Let's just remove them. With that, we've
got our three QBI. Let's keep moving
and add our charts. It's going to be the histogram, so let's drag and drop it
below the legend over here. And we can go and
remove the old stuff. So the old chart. And as well, we don't need the legends. Let's go and drop
the whole container for both of the legends. And let's go and
change the title to customer distribution
by number of orders. Okay, let's sit. Okay, and let's remove the
title from the charts. As you can see, this
container keep popping up because we have a new
legends and new stuff. Let's go and hide taking. Let's work on the right charts. It's going to be the detailed
list for the top customers. Let's throw it over
here. And we're going to go and
remove the old one. Now we're going to
move on to check that everything fits the entire view. Let's go check one
by one, entire view. Entire view, this one as
well. Everything looks fine. Let's check the last
table. It's standard. Let's go and switch it to entire view to use the whole space. All right, so now
we put everything together in one dashboard. The next step that
we're going to go formatting this dashboard. And it will not be that part because we have
almost everything. Let's start with
the first chart. Let's make everything
with a white background. Let's go to Lay out and change it to white as
well for the next BI, just to make sure that we
have done for everyone. All right, with that, we've got like a card for the whole QBI. The next step I would say
let's go immediately and start working with the
spacing between those charts. Let's click on the first one. If you remember in
the sales dashboards, we have agreed to have a
20 between each charts. Let's go to the outer
padding and make everything as a ten,
but only on the top. We don't need this extra space. Let's disable all sides equal and make it zero only
for the top as well. We say it, inner padding
going to be always seven. Let's have it like this
and do it for the others. Outer is ten, on top is zero, and the inner padding
going to be seven as well. For the last one you are ten. Remove it for the top. And the inner going
as well seven. Let's do it like
this. All right, so with that we are done
for matting the three QBI. Let's move on to the charts now. Let's go and select
the whole container. And as you can see, we have
everything done as before. The outer padding is ten and
the inner padding is seven. Great, let's go and
check the right one. We're going to have it as
well. Correct. As you can see, things get really fast
as you are building the second dashboard
using a solid structure. All right, so now we're
going to do one more thing about the top ten
Customers by Profits. As you can see, those
header informations or the field name
is not really nice. Now we're going to go and remove those informations and we're going to build our own
custom field names. So let me show you
how we're going to do that. Let's go to Dashboard. And let's grab a
horizontal container on top of our table. And here we're going
to go and put inside this container, the field names. Let's just make it a
little bit smaller. Let's start adding texts. So this is the first text. The first information
going to be the rank. Let's have a rank. Let's change the
font to a medium. Let's size to ten, and make it a little bit
lighter for the colors. All right, let's go with this. Let's okay, let's go and add another one for
the next field. So make sure to be on
the right side customers and we're going to
do the same stuff. You're going to be medium ten and this color we can go and
copy it for the next one. Let's go and, okay, now let's go and keep adding our field. So the next one going to
be the last order date. Let's paste the old one
and we're going to call it last order that
sets. Let's okay. Then we have the current profit. Let's grab a text instead
of the current profit. I'm going to go and
add the parameter and then the words profits. Let's go and make sure that everything has the same format. So you're going to
be Tableau medium ten and the same coloring. Let's copy it for the next one. We're going to add another
text for the sales past. Let's have the sales.
And the last one going to be the
number of orders. Let's write it like this,
past, remove the year. We don't need it here. As you
can see, we got our titles. What you're going to do,
we're going to go and remove the titles from
the original table. Let's hide the field
labels and as well, let's hide the header. All right, next we're going to start working on the alignment between the titles
and the detail list. So we're going to start
moving stuff around. First I'm going to go and
make it a little bit bigger, and then we're going
to start moving those boxes, the information, until everything
matches the last order, a little bit to the right side. Maybe make this filter
a little bit smaller. And then let's go and push
the sales a little bit to the right side as
well the profits. Now we're going to go and push this a little bit
to the right side. You can see we don't have any
more spaces for the order. Let's go and just call
it orders. All right? And we're going
to go and move it again a little bit to the top. Okay, I'm happy with that. Everything is perfect.
And now we have formatted all the charts that we have inside the customer dashboard. Next we're going to
go and start cleaning up the filter information. Let's go and show the filter
what is happening here. Okay, now what
we're going to do, we're going to go and remove all additional information that Tableau did add to
our new container. We don't need all
those information. Let's go and remove
them one by one. And with that, we got exactly like before, the same container. And of course, you
can go and start testing your dashboard again. We can go and switch it, for example, to 2022. And as you can see,
everything changed, even we have a new
top ten customers. We can go and add, for example, different subcategories and
everything is reacting, so everything is perfect. Let's go and put
everything back to 2023. And with that, we have
fixed our filter. Let's go and close
its, let's hide it. All right, so now the next
step that we're going to go and add interactivity
in those charts. So make sure to
select the histogram and use it as a filter. With that, if the
users go anywhere and start selecting staff,
for example, those two. And with that, as you
can see, the dashboard is reacting. Let's deselect. All right. So now let's do the same
stuff for our top lists. Let's go and make
it as a filter. And now we can go and
select our top customer. And we're going to
have a quick analysis only for this customer,
which is really nice. So let's go and deselect that. And with that, we are done
with the interactivity inside our dashboard. Now moving on to the last step where we're going to
work with the icons in order to make navigating
our two dashboards very easy. Okay, so now let's go and
fix this icon over here. So double click on it. And now finally we can see it's going to navigate to
customer dashboards. And now since we are at
the Customer dashboard, we're going to show an icon
that is like an active icon. In order to do that, let's
go and choose the icons. So as you can see,
this one going to be the active icon if the customer select the
customer dashboard. So let's go and select that. So now everything looks
good, let's go and it. Okay. And with that, you
can see we have a new icon that indicates we are now
at the Customer dashboard. All right. So now next
we're going to go and fix the sales dashboards
icons over here. So let's go inside it and navigate to the
customer dashboards. And let's choose the
one that is not active. So we're going to go
and select this icon. All right, so that's all okay, so now let's go to the
sales dashboards over here and change it
to an active icon. We're going to choose
this one over here. Sales dashboards active. So select that and
let's have an okay. All right. So that's it. With that, we have
fixed the icons. So the sales dashboards
going to be activated. If you go to the
customer dashboard, it's going to be
exactly the way round. All right. Key. So with that, we are done with the second dashboard
inside our projects. Let's go and test everything. So let's go in the
presentation models over here and let's
check the data. All right, so now we are
at the Customer Dashboard. Let's go and click on
this container over here. As you can see, everything
is working nice. So now let's go and switch
back to the Sales dashboard. So let's click on this icon. And now as you can see we are back to the Sales dashboard. So with that, the user
should not go to the taps and switch between
those two dashboards. The users can just
go and click on those icons in order to switch between those
two dashboards. And with that, I'm
really happy to announce our project is completed and we have fulfilled all
the requirements. I will leave this project inside Tableau public or you can get
it from the download link. All right, so with
that, we have completed our Tableau projects
and we walked through all the phases that I
usually follow in order to implement any Tableau
projects from the scratch, from the requirements until the delivery of the dashboards. And here again, my
recommendation is that to not rush the
projects where you can go immediately start
building charts and dashboards without having
a clear or organized plan. So do it step by step in
order to deliver clean work.
199. HR Project | Introduction: Friends, so today,
we're going to go and implement an amazing
table project, where we're going
to go and build an H R dashboard using Tableau. And what's special about
this project is that, you will not only
learn how to use Tableau in order to
create visualizations, but also you can
learn how I usually implement professional
table projects at my work. If you are new here, welcome. My name is Bara, and I lead Big Data and BI projects
at Pacida S Pens. I'm here to share
everything that I know about working with data. So make sure to subscribe
so you don't miss anything. In this table project, I'm going to guide you step by step, starting from the
user requirements. Then we're going to go and draw the concepts and the
mockups of the dashboards, and at the end,
we're going to have a fantastic dynamic
dashboard using Tableau. That means by the
end of the projects, I'm going to leave
you with a table dashboard and as well, real life skills on how to implement table
projects. My friends. Before we jump to the project, I would like to take a moment
and say the following. Everything in this
project is for free. And as well, I highly
recommend that you follow me along with this
project, step by step. Because just sitting
and watching, it will not really help, you have to get
your hands dirty. And, hey, this is your project, so feel free to share it
in any platforms you want, like in Linked in or in
Tableau public as a portfolio. So that's all for now, let's jump and get started
with the projects. Now, my friends, by the
start of each project, first, I decide on the coloring. The first decision that I
make is whether we want to have a dark or light
theme in the dashboard. And since the last sales
project was a light theme, this time we're going to
go with the dark theme. After that, we have to
decide on the four colors, not more, and we divide
it into two categories. The first category is
the basic category, and here we have two colors. Black and white. Usually, I
go with the gray coloring, so we have a dark gray
and very light gray. Now, the second category, we have the custom category, and here we have the two
colors of our own style. So for this project,
I'm going to go with the green and pink. But wait wait here,
we have an issue. My wife said this is not green. This is Persian green, and the other one is not pink. This is royal Fuca. So sorry. All right. So those
are the coloring that I've decided
for this dashboard. Of course, you can go
and add your own style. You don't have to
follow my coloring. All right, friends,
Table projects has mainly three phases. The first one is by preparing
our data where we go and connect our data to Tableau
using a data source. So we have always to
do this step before building any charts
or doing an analysis. In the second phase, we're
going to go and build many, many different charts and visualizations based on
the user requirements. And in the last phase,
we're going to go and put all the charts in one single
consolidated dashboards. In this phase, it includes
a lot of formatting and refining in order to make the dashboards user
friendly and effective. So let's start with
the first phase, where we're going
to go and build tableau data source
for our project.
200. HR Project | Build Data Source: All right, friends, now
we're going to go and build the data source
for our projects, and here what we're going to do. First step, we need data. We're going to go and download
the data for the project, and then we're going to
go and connect the data with Tableau using
a data source. After that, we're
going to go and check the quality of the data
and the data types. And the last step, we have
to go and understand and explore our data before
building any visualizations. Okay. The first step of building a data
source in Tableau, we have to go and get a data. And to BNS I've checked a lot
of projects and datasets, and I didn't find anything that is suitable
for these projects. That's why I have decided
to generate my own data. Of course, I have a
personal assistant in order to help
me with this task, and that is the SGBT. I have asked the
SGBT to generate a Python code in order
to generate a data set. After a long shot
and twisting around, Finally, I've got a really
nice code in Python using the library faker in
order to generate data. If you want this
Python code that I've used and the prompts
in the SGPT, you can find everything
in the project link. Friends, as you
can see, SGP here, help me in order to generate
a datasets for practicing. Now let's go and get the data. In the video description,
you can find a link for this page where I've collected everything that
you need for these projects. As you can see here,
we have a Zip folder where you have all the
files for these projects, and if you scroll
down over here, we have the user story
for this project. Here we're going to go and
build tableau dashboard for the human resources based
on those user requirements. L et's go and download the
Zi folder, it's over here. Let's click on it, and you can have it in the
download folders. Now the next tab, we can
write click on it and extract all and then extract.
We have it over here. Now what I usually do,
I move this folder to somewhere else
because I tend to clean up the downloads
and if you lose the connection between
tableau and the data, you will get a lot of errors. Let's go and do
that. I will just copy it and put it
somewhere like here. Now let's go inside it and
check what do we have. What do we have over here,
we have icons and images. You can find all
those stuff that we need later for the dashboard. And as well, you can find
the Tableau project file, and of course, you
can go and download it from the Tableau Public. And here we have our data,
human resources, CSV. This is the data
of our projects, and you can find the
dashboard mockups that I've created using
the Draw AO. All right. So with that, we have our
data for this project, and the next step of that,
we're going to go and connect Tableau to our data. All right. So the
first step of that, we're going to go and
start Tableau Public. Then we are now at
the landing page. Let's go and connect to our
file using the text file. Then we're going to go and
open that downloaded data, human resources, CSV. Let's go and open it. Now, usually, the next spit
that we're going to go and build a data
models from the files. But now for this project,
we have only one file. That's means we don't
have to worry about relations and joints
and union, and so on. Our data model has
only one table, one file for the whole projects. Now the next sib of that,
we're going to go and check the quality of the data
inside this table. The first thing is,
of course, if you are using text file das, the columnames
should be correct. We can find over here that
everything looks fine, right? We have employee ID, first name, last name, gender, stage, and so on. So the
names looks okay. And if you don't
have it like this, we have to go and check the
properties of the file. So in order to do that,
right to click on the table. Usually in text or CSV files. The first row should be the filled name or
the column name. So make sure this is checked, and then we're going
to go to this option. Text file properties,
let's coincide it. And here, it's very
important to that. You have the setup like
me that I'm showing now. So the filled separator
should be the semicolon. And if for any reason that tableau did select
something else, make sure to select Semicolon. And the third option
is important, it is the encoding of the file. It should be as well UTF eight. So if you have those
options like this, you should be safe, so
let's go enclose it. That's means Tau is
reading the files correctly and the column
names are correct. Now the next exhibit that we're going to go and
check for each field whether Tableau did assign
the correct data type. Let's have a look. The first column then blo ID,
it is a string, and that is correct
because here we have a character
between the numbers, so we cannot have
it as a number. First name, last name, gender,
all those information. Has characters inside, and
of course, it is a string. Let's move to the right side. Now we can see we have two
columns about the locations. As you can see
Tableau did assign this correctly to
a geographic role. If you don't have it like
this, it's very simple. Go over here on this icon, and then we have here the option of geographic role and make sure that we assign it to
the correct information. Now, let's keep
moving, we have here, the education level, which
is correct. It is string. Then after that,
it's very important. We have several dates. We have the birth
date, the higher date, and the termination dates,
and all of them has correct data type. Now let's keep moving to the right side. And as you see, we have
department, job titles, all of them are string,
and we have salaries. So the salts is the
only field inside our datasets that has
the data type number. The last one is the
performing strting, it is string, which is correct. As you can see, Tableau
did wonderful job by mapping the correct data
types to the columns, and having the
correct data types is very important in your
project in order to do the calculations correctly and to have good data
quality inside your dash. It's so good that we have
built our data source and everything
looks really great. Now the next shibit that before I start
building anything, any charts, I would like to understand the data
to explore the data. What I usually do, I go and
create any sheets over here, and then I start
dropping in formations to the sheets in order
to explore the data. For example, which departments do we have inside the data? As you can see we have
seven departments, customer service,
finance, HR, and so on. Then what is interesting,
for example, the job titles
drop it over here. And now we can see
all those job titles, but we could understand as well, there is relationship between the departments and
the job title right. So what we can do over here if you have relationship between columns at that, you go
and create Hierarchy. Let's go and do that.
It's very simple. Let's take the job title, drag and drop it on top of
the department like this. And then you have to
assign a name for it. I'm just going to leave it
like this. Let's go and click. Now on the left side,
we have hierarchy, where it starts
with the department and ends with the job title, the order of the hierarchy
is as well correct. Let's keep exploring.
Let's go and get the education level,
for example, over here, and we can see there is
no really a relationship between the education level
and the jobs and department. I go and go and drop
it in order to see. In our data, we have
four education levels, we have bachelor, high
school, master, and PhD. As you can see we are just browsing and exploring the data. Now my recommendation
is that to bows the video and you go
through all the fields. Only after we understand
the content of the data, we're going to proceed
with the next steps. Now I hope that we have now better understanding
about the project data, and now with that we have
a solid data source in order to start building
charts in Tableau.
201. HR Project | Build Charts - Part1: All right. So now we're
going to go and build the charts for the
first dashboard, the summary dashboards, and
here what we're going to do. First, we have to
analyze and understand the requirements in order
to decide on the charts. After that, only for one time, we're going to go and do
initial steps by formatting the worksheets in order
to use it as templates. After that, we have to
make sure that we have all the dimensions and measures in order to build the charts, and if not, we have to go and
create calculated fields, and only after that, we can
go and build our charts. The last step, we have to
take care of the format. So now let's go and start with the first step
where we have to analyze and understand
the requirements and decide on the charts. Okay. So the first step before
building anything that, we have to go and understand
the requirements. So let's have a look
to the user story. So what do we have over here? We have to go and
build a dashboard for the HR managers in order to analyze the human
resources data. And we have to provide
them with two views. One has a summary view for
high level insights and another detailed view in
order to show a list of employee records for
in depth analyzers. So that means we might end up building two dashboards,
but we will see. Let's start now focusing on the first section,
the summary review. So the summary review
should be divided into three main sections. This
is about the dashboard. We should have an
overview section, demographics, and
the income analyzes. The first requirement for
the first chart going to be display the total
number of hired, active and terminated employees. It sounds like we have different
status of the employees. We have active and terminated. Now in the next
spit, we're going to go and decide on
that chart type. Since we are talking about the
total number of employees, it's like a big number that we should present in
the dashboards, so we can go and use the bands. Bands are a great way
in order to highlight the big numbers that the pig measures inside our
data in the dashward. Pack to tableau, but now
before we start implementing any requirement before we
build any sheets or charts, we have to do an initial step, and that is by formatting
the first sheets to be used as a template for all other requirement
and all other sheets. That means we're going to
go define the background, the colors, the fonts, everything to be prepared. That's of course better than creating the sheets
from scratch each Now with the first preparation
we're going to do, we're going to go to the
format in the menu over here, and then let's go
to the workwok. Now we're going to go
and define the font for the whole projects. Let's go over here to all and then let's go
to the Drop list. For this project, I've
decided to go with the tropuh MS. Let's
go and select it. Now everything that I'm creating in dashboards and shields, going to be using this font. All right Now the next step that we're going to go and start adding the colors that we have
defined for this project. Let's go to the marks over
here and select the color. Let's go to more colors. So now we're going to go
and add our four colors. Let's go and start
with the first see over here, click on it, and then go add the codes, and with that, we have the
green color over here. Let's go and click then,
add two custom colors. This, of course,
can help us to have e quick access to the colors that we defined
for the projects. Now let's go and add
the second color. Again, the same
steps, let's select the sale below it
and add the codes, and with that, we
have the pin color. Let's go and click on,
add two custom colors. Now the next two
colors is going to be our basic colors,
select on the sale. Add it and with that we have our gray and then add
to custom colors. Now let's go and
add the last one. The fourth one, it's going
to be the light gray, and as well add
to custom colors. With that we have
our custom colors to be used in the
whole projects, those four colors.
Let's go and hit okay. Now what we're going to
do, we going to define the default font color
for the whole projects. Again, we're going to go
to the font over here, and then let's go
to more colors, and let's pick the
gray, and then select. So that's all for the
colors and for the fonts. Now, the next step that
we're going to go and define the color
of the background. As we decided at the start, this project going
to be a dark theme. Let's go again to format
and then to shading, and then we're going to
go to the worksheet over here and let's pick
the first dark color. Now let's move to the next step. We want to go and change how the sheet is
fitting the view. For dashboarding, it's always good to have it as entire view. The default tableau
show it as a standard, so let's go and change
it to entire view. Let's click on that, with that, the chart can take always the whole space that is
available in the view. Now maybe one more thing
that's about the title. We don't want to show any
titles in our dashboards. We're going to go and
create our own style. So right click on
it and high title. All right so that we have
done the initial steps, and we have now a template to be used for all other sheets. Now I would say let's
go and save our work, and this is really amazing
new feature from Tableau. Are allowed now in Tableau
Public to store and save our work locally at
our BC without publishing. Let's go and do that.
This saves a lot of time. Let's go to file over
here and save us, and then we're going
to go to the types over here and to make sure that we are selecting Tableau
package workbook TWX. Now we can see over here,
we have a second option called Tableau workbook TW. I have as well a dedicated video explaining the
differences between them, but we will go with the package because I would like
to have everything, the data, the data
source, and the visuals. Go with the second option,
you will not save the data. You'll be saving only
your work and going to be really hard if you lost the
connection to the data. Let's store everything
in one file and choose the tableau
packaged workbook, and let's give it a name. HR dash words So. Let's save it. And
with that we are done, let's start implementing
the first requirements. All right. So now, the first
step with that, we're going to go and ask ourself, do we have all the
data in order to build our visual?
So what do we need? We need the total
hired employees, total active employees,
and terminated. So now if you check
our data over here, we don't have any information about the status of
the employee, right? So that's means we have
to go now and create calculated fields in order to derive and generate
those informations. So the first one is
total hired employee, which is records available
in this data set. We have this as a
default over here, but I would like to go
and create a new one. Let's go ahead create a
new calculated field. Let's give it a name
called Total Hired, and this is going
to be very easy, it's going to be the
count function for the employee IDs. So that's it. Let's go aha and click. Now the next one, we want the total number of employees
that are terminated. Now we have to take a
look to our data in order to choose a column in
order to build this logic. We have here the
termination date. The logic can be very simple, if we have termination
date for the employee, then this employee
is terminated. Otherwise, the
employee is active. Let's go and create this logic. So let's call it
total terminated, and now we're going to
have the following logic. Since it's logic, we're going to go and use the function if, if n is null, for the term date. So we are saying if the
termination date is not null. So we have a value inside
it, so what can happen? Then show the employee ID. And that's it, so
let's have an end. That means if it is null, so we have a null
value inside it, we will get as well null. Let's go and test the logic. I'm going to just click OK. And of course, in
order to test stuff, I'm going to have
a test worksheet. To check the data. So I need the records
of the employees. Let's get the employee ID, yes, add all members. Now let's take the termination
date as well over here, and our new field total terminated as well
to the outputs. So now as you can see over here, we have all the employee IDs. This is normal, and then we
have the termination date. So you can see if it is null, then our new field going
to have as well a null. So since we don't
have termination dates for those employees, then they are active,
so we have here nulls. But only if we have a date, then our new field
going to show the ID. We are doing that because
we want to go and count how many Ds do we have
inside this new column. That means our logic is working. What we're going to do now,
we're going to go and edit. Again, the calculation,
and we will do on top of it over here, just to count So we are counting how many employee IDs going to be used or shown
after this logic. That's it. This is
the total terminated, and to get the total
active employees that are actively hired
and not terminated. We're going to use
exactly the same logic but the way around. Let's go and copy everything
from here and click Okay. So of course, we're going
to get a red one because Tableau used to have it as a dimension and it's
not working anymore. So let's go and drop it. On more thing, as
you can see here, we have it as a blue bill,
the total terminated. Let's go and convert it to a continuous because it is
a major nut dimension. Now let's go and
create our third one, so it's going to be
the total active. And let's have the same logic. But before we start counting, I'll just remove
those staff away, I would like to test the logic. So if is null. So if the terminated
date is empty, then show the employee ID. Let's go and test
it. So I'm going to. And the same thing, let's go and drop it to the
view over here. Now as you can see here,
we have exact opposite. If that terminate date is empty, then show the employee ID. And if we have a value like
here for this employee, then don't show any value. Now, the same thing,
we're going to go and summarize all those values. So let's go and edit it
again and add accounts. Like this and it. Again, it will not work over here and we have to change it as well from a blue pill to a
green one to continuous. With that we got our
new three measures that we're going to
use inside our pans. Let's go back to our
templates over here. Since the band is
only one number, we don't need any
dimensions in the view. Let's go and drop
the education level. The first one is going
to be the total hid. Let's go and drop
it on the text. Of course, I would not
leave it as automatic. I'm going to make sure
it's always a text, and our number is here
on the right side. Let's go and change the setup. Let's go first to the
text to the three points, and now we're going to go
and change the font size to 18 and as well the color
to our light dark. Let's go and hit k, and as well. Now we still have it
on the right side, but it's way bigger than before. Let's go to the alignments and everything to the
center to the middle. That's it. This is the first peak
number from our data set, so the total number of employees inside our
dataset is 8,950. Let's give it a name as well. It's going to be
the pan of yards. So we are done with
the first one, Let's go to the second one. We want to have
the total active. Instead of creating a
new sheet from scratch, we're going to go
and duplicate it. So right click on
it and doblicate. What we have to do is to
take the total active, drip it on the tick over here, remove the old one, and let's go inside in order to make sure that
everything is fine. So we have here a new
line at the start, let's remove it, and hit. That's it. Let's go
and give it a name. You are the ban of active. Now, let's go and
create the last one. Let's go and duplicate it again. You are the ban of terminated. Let's go and get the total terminated two the
text over here and drop the old one away and as well
remove the new line. That means the total
terminated employees inside our data is
966. All right. So those are the
three peak numbers, the three pants for
the first requirement, the hired active and terminated
employees. All right. Moving on to the next
requirement at this says, visualize the total number of employees hired and
terminated over the years. We have to display
how the number of employees are developing
over the time, and the best type of charts for this type of analysis
is the line charts. You can go as well
with the bar chart. The line chart is
the best in order to visualize the
trend over time. So back to Tableau, let's go
and create our line chart. What we're going to
do at the start, we're going to go and duplicate one of those sheets in order
to have the same style, and then let's go and rename it. Going to be hired by year. Let's go and remove
the measure over here and now we have
an empty chart. Since it's over the time,
we need a date field, and this is going to
be the higher date. Let's drag and drub it to
the columns over here, and then the next one, we need a measure and it's going
to be the total hid. Let's rub it to the rows. Of course, our chart
is a line chart. Let's go to the marks over
here and make it a line. Now by looking to the charts, we have a lot of
unnecessary information over here that we don't need. Let's go and edit this x. Let's include zeros like this. Now the data looks way better. Now, the next sib,
we're going to go and edit the design
of these charts. First, let's go to the colors over here and pick our colors, so more colors, and
let's pick the green. The next sib, I
would like to go and highlight all the
area below the line. Let's go and get an
area chart below it. It's just for the design. In order to do that, you're
going to go to our measure, hold control and just duplicate
it as a second measure, with that, we have, of
course, two charts. One going to stay as a line, but the second one going
to be an area chart. Let's go to the
second one over here and change the type
two and area charts. Now the next step with that
we're going to go and merge those two charts into
one using the dual x. Let's go to the right
measure over here and let's use the dual axis. Of course, now things
are not matching together because we
have removed the zeros. Let's go to the right one, right click on it,
and synchronize xs. Now the line chart is exactly
matching the area charts. Now we can go and get rid of
all those lines and stuff, so let's go and remove the
headers from the left side, and as well from the years. And we want to get rid
of all those grids. So right click over
here and go to format. And now we go to the lines
and let's go to the rows. I remove the grid lines. Let's make it none. But
now looking to the charts, there is like a white box around our charts. What
we're going to do? We're going to go to the grid
over here and then go to sheets and let's remove
everything from here. So remove the row divider and
as well the column divider. With that, it's
look really clean, but still it looks
like not a line chart. It looks like an area chart.
Let's go and change that. Let's go to the area chart
and let's go to colors, and let's go and reduce the
opacity 215, like this. One more thing we can go and
reduce the size of the line. Let's go to the
line over here and make it a little
bit like thinner. I'm happy with that.
It looks nice. With that we got the total
hired employee over the time. Now we need the same chart, but not for the hired
for that terminated. What we can do were going
to go and duplicate this, and let's give it the name. It's going to be
terminated by year. And of course, we have to go and change all those affirmations. Now we have to go and replace the higher date with
a terminate date. So let's go and replace it. You can do it on top of it
in order to replace it. Now we have the termination date instead of the higher date, and now we have to go and
replace the measures as well. We need the total
terminated on top of the first one and the same thing
on top of the second one. By looking to the data,
we have here in nulls because we have employees
without any terminations. We don't need that.
Let's go and hide it, right click on it
and click hide. We don't need to remove
any zeros because the first value is one and it's very close. We are
fine with that. Let's go and hide all those
informations left and right and as well from here
or remove the headers. Now let's go and change as
well the color of this. Instead of green, we can have
a pink for the terminated. Let's stay at all and then let's go to colors and to
more colors and pick our second color over here
and click Thus we are applying the same color on both charts, the
line and the area. All right. We are almost there, but there's a white
dotted line over here. Let's go and remove it.
Let's go to format, and I believe it is a line, and it is the zero line. Let's go to the sheet and remove the zero lines,
and let's have a none. Perfect. With us we are done, we have now the total
terminated employees over the time by the years. With that, the
requirement is solved. Let's move to the next
task and it says, present a breakdown of total employees by
department and job titles. This means we have to go
and analyze and compare the values between different
categories, the departments. That means we are talking
about the category magnitude, and the best chart in this category is to go
and use the par charts. Now, my friends, if you need a deeper knowledge on how to
choose the correct chart, I have made a dedicated
tutorial about this topic, explaining the different
types of chart categories, when to use which category, and what is the best
chart for each category. So now let's go and
build a par chart for this requirement.
Let's go and build it. We're going to
duplicate as usual, and let's give it a name. It's going to be
the departments. And as well what
we're going to do, were going to go and
remove everything, all those dimensions
and measures. Now, it's very
simple. Let's go and get the departments to the rows, and we need the total
hid to the columns. Of course, we have
to go and change the marks to the parts. Now, of course, because
of the previous charts, we go and change the
opacity to 100%, and as well, let's go and pick the green color for this charts. Now since we are
using the Part chart, it would be nice if we
go and saw the data. Let's go to the axis over
here and click on sort. With that it is descending, we have the department
with the highest employees until we have the last
one is the lowest. Now since we are
using a par chart, it looks like a rank. We are ranking the
departments by the employees. We can go now and add
like a nice index, a nice rank number near
those departments. In order to do that, let's go to the roads over here
to the empty space, double click on
it, and now we can go and use the function index. We can use it in
order to ranking. So let's go and hit OK, and of course, it can break everything because
it's a measure. Let's go and convert
it to discretes. Now as you can see, we have a nice rank
to those departments, so we have 123 and so on. We can go and move it to the left side to the
names of the departments, and it's like a quick
indicators for the ranks. That's now let's go and format the charts by removing all
those unnecessary stuff. We're going to go to the
axis, remove the header. Let's go to this
department over here, right click on it and
hide field label. Of course, we're going to go
and remove all those lines. Let's go to format, and now let's go to the left
side to the lines. Let's go to columns and remove
the grade lines to none. All right. So that's
it. Now we can see the total number of
employees five departments, and we have a nice rank for it. Okay. Moving on to
the nx requirement, it says compare the
total employees between HQ and the branches. And here as an info,
New York is the HQ. It's like the previous
analysis where we have to compare the values between
different categories, the HQ, and the branches, and the bar chart here is the best type of chart
for this analysis. Now let's go and
create it as usual, we're going to create
a new sheet by duplicating any of
the previous ones. Let's call it location. And of course, the
first question is, do we have the informations
in the datasets? We don't have any fields about
the H Q and the branches. But about the locations, we have only two informations, the city and the states. But in the requirement,
we have a hint where it says the state New
York is the HQ. That means all the other
states are branches. So again, we have to go
and create this logic. So let's go back to
our test over here, and let's go and get
the states to the list. And now we're going to
create very simple logic where we are checking
the value of the state? If it is New York, then it's HQ. Otherwise, it is branch. So let's go and create
a new calculated field. Let's give it a name location. And now since we are evaluating
a value from a column, we're going to go and use the logical function
case statement. So we're going to say case. And then what we are evaluating, we are evaluating
the state, right. Let's write state. Now let's
evaluate the first value, which is the New York, right. Make sure to write it exactly like we have it in the dataset. So the first capital
litter, as we'll here. What happens if the
state is in New York, then you are the HQ,
right? It's like this. Now if the state is not in New
York, then it is a branch. So we're going to go and
use the default se like this and what can be
going to be the branch. So that's it, and don't forget
to add an end like this. So let's go and hit okay. Now with that, we got a
new field code location. Let's go and test, of course, to the right
side of over here. Now we can see in this
field, we have branches and HQ now in order to see all
the values of the states. I don't want to see
all the employees, so let's go and remove
all those informations, and now we can see
very nicely how the states are mapped
to the location. So only New York HQ, all other states are branches. Now we have the field that we need for their
requirements right. Let's go back to the
locations over here, and let's get rid of
those dimensions. We don't need it. We're going to stay with the total hired, but now we need our new
calculated field to the rows. Now, I would like
to go and switch this charts where we have
the locations on the rows. To go and click on this.
And they are switched. That's it, as you can see,
we can now go and compare the total employees between
the HQ and the branches. As you can see in
the HQ, we have way more employees than
the other branches. Of course, now, the
next step with that, we're going to go and change
the designs over here. Let's take the
location and put it to the colors by holding
control, of course. Then let's go to the
colors and edit colors. Now, let's go to the SQ double
connect in order to get our green and as well to the branches doubt and
let's get the gray. For the branches. I would like to sort the
data the way around. I would like to have the
Q first then the branch. Let's go to the location,
right click on it. Then go to the sort, and we're going to go
and sort it manually. I would like always to have
the HQ to the left side, so H Q on top and
then the branches. Now let's go and
remove some headers in formations from here. Of course, as usual,
we're going to go and get rid of those white lines, Let's go to format,
and then let's go to the lines and then
here, the axis rollers. Let's go and select none. As well, I'm going to go
to the next one x six, and let's have a none as well. Now on the right side over here, you can see we have a legend, we're going to go and
hide it since we want in the dashboard to
design our own legends. Let's go over here to this
small arrow and hide card. So that's it for
this requirement. Okay, let's go to the next
requirement, and it says, show the distribution of
employees by city and state. Now since we are talking about
the location informations like the states and the cities, here we are talking about
the special analyses. And of course, the maps are the best visual for this
type of analysis. All right. So now let's go and
create a map in Tableau. We're going to go and
duplicate the sheets in order to have the same design.
Let's give it a name. Map states. Let's go and remove everything in
order to start from zero. Now in order to plot
a map in Tableau, we have to go and get
those two informations, the longitude to the columns, and the latitude to the rose. With that, tab going to plot
the word map in the view. Now what do we need,
we need the locations. Let's go and get the state
first to the details. Let's drop it over here. And now depend on your location, you're going to get
different results. For me, since I'm
now in Germany, it's going to says
you have now eight nn informations. How we
are going to solve it? We're going to go to the
map in the menu over here, and then we're going to go to
this option edit locations. Let's go there. Now it's
currently to Germany, I'm going to go and
change it to USA. Let's search for
USA and that's it. Now as you can see, we have
everything mapped correctly between my locations and the
informations from Tableau. If you hit k over here, the unknown stuff
will be disappeared. Let's go and do that. Now as you can see
Tableu understood the informations
and zoom into USA. But here we have
very funny parts on the maps. It's not correct. Let's go to the marks over
here and switch it to a map. Now as you can see
Tableau is highlighting the states from our data
with a green color. So now I would like to go and change the design of this map. Let's go to the
menu and then map, and then we're going to go to this option, background layers. Since the style of our
dashboard is going to be dark, I'm going to go and change
the style from light to dark, and I would like to
go and get rid of all those informations
that I don't need. Let's go and deselect
everything from the layers. So we don't need anything. All that I'm happy, we got a very clean map with only states and
information that we need. Now let's go and add
the stuff that we want. The first thing that,
I would like to add again the name of the states. So hold control, drag and
drop the state to the labels. Now with that, we got
only the states from our data highlighted in the map. The next step of that, I'm
going to go and change as well the color based on
the hired employees. Let's close this
over here and get hire employees to the colors. Now tableau is using another
colors that we want, let's go to the
colors, edit colors. Now instead of having automatic, we're going to have our
custom coloring right. So let's go to the blue
over here, click on it, and we're going to have our
green again. That's it. That we got our coloring. Now it's really white,
what I'm going to do, I'm going to go to
the colors again, and let's go and
reduce the opacity. Let's just reduce
it and maybe more. Let's go and reduce more
to maybe 30. All right. What else we can do?
We can just highlight the borders of the cards.
It looks really nice. Let's go to border and
choose this color over here, and with that we have nice
borders between the states. That's it, we have now the total employees for each state, but now we have to have it
as well for the city, right. Let's go to the city
over here and add it as a new layer on top of our ma
So let's drow it over here. Now we don't have enough points. What we're going to
do, we can add as well the states to the details. Now with this Du is able to map all the cities
to the states, and as you can see, we
have those small circles. Now let's go and
add, for example, the total hired to the size. If the circle is bigger, that means we have
more employees, but I would like to increase it a little bit more like this, may As well, let's go
and add the coloring. Maybe we're going to go with
the location information. Let's go and get the
locations to the colors. That means the gray
dots are the branches, and only the green
one is the H Q. Now, let's go and
change a little bit, the design of those circles. Let's go to the colors. Now let's go and add
the border for it. Using our colors, it's
going to be green one. Then let's go and
reduce the opacity, maybe something like this way back to around maybe
30. All right. I'm happy with that.
On the right side, as you can see we
have those legends. Let's go and remove them. So hide and as well hide. So far, I'm happy
with this design. We got the total employees
by the states and as well by the cities and we
fulfill the requirements.
202. HR Project | Build Charts - Part2: So that we have covered
all the requirement of the overview section. Now let's move to the next one. We have the demographics. The first requirement in
the demographic section is present the gender
ratio in the company. We have to analyze the gender
proportions in our data and we call this type of analysis
part to whole analyzers. And the PI chart is a wonderful chart in order
to do this type of analysis. Okay, let's create
bi chart in Tableau. We can go to the
locations over here and doublcate it in order
to use the same setup. Move it to the right
side, and let's give it the name, gender like this. Let's get rid of all
those informations to start from Of course, the question is, do
we have the data? Well, yes, we have the gender
information in our data, so we don't have to go and
create an e calculated field. Let's start with
the marks. I would change it from bar to Pi. Now in order to create
Pi chart in table, we have to go and
do some tricks. Let's go to the columns,
double click on it, and let's select the
average and zero. It is placeholder for a visual or chart in t.
Now for the Pi chart, I have a full detailed video on how to create a step by step. Now we have to do it
a little bit quickly. For the Pi chart, we
need two circles, one for the inner circle and another one for
the outer circle. That's means we
need two visuals, and that's why I'm going to
have two placeholders for it. So hold control and
a duplicate it. With that, we have
two circles and now let's go and have a dual
axis for both of them and make sure to synchronize the axis and as well to hide
it and from below as well. Now we have two circles
on top of each others. Now let's go and configure
those informations. Let's go to all
first to the size. And make it a little
bit bigger like this. Here we have two
marks. The first one is for the outer circle, and the second is for
the inner circle. In order to see the coloring, we're going to go and change the inner circle to something dark, as well what you're going to do, we're going to go
to the sides over here and reduce it
in order to see. As you can see, we have
already a Pi chart right. Now, usually in the Pi chart, we show the total
aggregation in the middle, and that is the total hid. Take the total hid and put
it to the labels over here. Now as you can see, we have ever nice number in the middle. Now let's go and configure
the outer circle right. Let's go to the first
chart over here. Of course, we want to divide
the chart by the gender. Let's go and take the gender
and put it to the colors. Now let's go and edit the
colors, it the colors. Now, of course, I
will not go with pink and green because
the pink means in our dashboard
terminated employees and we cannot use it over here. We're going to stay
with the green. Let's go to male over here. Let's go and get the green, but this time I'm
going to make it a little bit darker like this. And then hit k. Now
let's go to the female. We're going to take
it as well as green, but make it lighter. Maybe something like
this way lighter. As you can see the circle
is splitted to two sits. Now we need as well a few informations on top
of this circle. Let's go and get the gender
or let's comp it from here, hold control and put
it to the labels. As well, we need the
percentage of the employees. Let's go and get the total
hit to the label over here. But we don't need it
as an absolute number. We would like it
as a percentage. Write the click on the measure, and let's go and have a
quick table calculation. So that we got a percentage
for male and female. I would like to
round those numbers. Again, let's go to our
measure and format it. Then let's go to the left side over here instead of automatic, let's go to percentage and
reduce the decimal places. With that, we are
rounding the percentage. So as you can see in
the chart we have for the male 54 and for
the female, 46. It looks really nice and
let's go and clothe it. Now this calculation,
I think we're going to need it later
in other charts. I would like to have
it in the data source, so that I don't have
to go each time and format and create this
table calculation. Let's go and drag and drop
it to our data source. Now as you can see
on the left side, we have a new measure. Old calculation one.
Let's go and name it, so let's give the
percentage total hid. This is really nice in order to reuse the stuff that we
have already created, and it is a new
calculated field. In order to check the
formula for that, let's go and edit the
field, and you can see. It's very simple, the total hid divided by the total total hid. All right. That's it
for this requirement. Now, we have a really nice
pie chart in order to see the distribution of
employees between genders. Wait, wait. Sorry, when
we think we have to remove the allegiance,
so we are not done yet. So let's go and hide it.
All right. That's it. Moving on to the next
requirement and it says, display the distribution
of employees across age groups and
education levels. Now we have to show
the relationship, the correlation between
two categories, two dimensions, the age groups
and the education levels. One of the best
chart for this type of analysis is the heat map in order to show the
relationship and correlations between
two dimensions. Okay, let's go and
build the heat map. As usual, we're going to go and duplicate stuff.
Let's give it a name. I'll be age versus education. Now let's go and get rid
of everything like this. Now, the first question
is, do we have all informations in
the data source? Well, we have something
about the education level, so we are safe with this,
but we don't have ages. Of course, we can
go and calculate the age from the birthday, here we have the
birthday informations, and we can use it in order
to generate the ages. We have to go to
our test again in order to see whether
everything is working fine. Let's go and add again
an employee ID in order to have the
level of employees, and let's go and get the
birth date to the view. Now let's go and create
the logic of the age. Going to go and create
a new calculated field, and let's call it an age. Now of course, how do
we calculate the age? It is the number of years
between the birthday and today. Let's go and do that. We have to go and subtract
today from the birth dates, and we can go and use
the date dif function. Of course, the age is based
on the number of years. We have to specify
here the date part. So it's going to be year. What is the starting date? It is the birth date, and what is the end date? It's going to be
the function today. The two day function is
a table function that generate the current date
as we are speaking now. That's it. It's
very simple, right. Let's go hand it okay. With us, we've got a
measure continuous measure because of course, it's ages. So let's drop it to the output in order
to see the results. Now we're going to
have it as a measure. I would like it to
have it as dimension, so let's convert it
to dimension and as well to discrete in order
to see the numbers. Let's put it beside
the berth dates. Now we have ages right. I think
this is the simplest one. If you check this
employee over here, you can see Perth is 2000
and we have around 24 years. Of course, if you are doing
this project in the 2025, you will get the age of 25. As I'm recording this
video, we are at 20:24. It's really interesting when
you are doing this project, write it in the comment below. Of course, the task says, we need age groups. We don't need ages. In order to create age groups, we have to go and create again a new calculated field
on top of the age. Let's go and create a
new calculated field. Let's give it the
name age groups, and we're going to go and
use the FL statements in order to group up the
employees to a specific range. Let's start with the first
one, the youngest employees. All the employees that their
age is below or younger, 25 going to be in one range. We're going to say if the age
like this younger than 25, then they belong to the
group younger than 25. Like this. Now let's go and
define the second group. It's all employees 25-35. So we have ten years in between. All employees where their
age is older or equal 25, and their age as well is
younger than 35 like this, and they all belong
to one group, which is 25-34 because here
we are not including the 35. That's it for this group.
Let's go to the next group. I'm just going to go and
cry base it over here. We will just increase the
number of years 35-45, and the same thing
over here, 35 and 44. Let's go and add another group, it's going to be between
the 45 and the 55. Let's just increase everything with ten years, and
as well over here. Now let's move to the last group to the nicest group
where we have all employees where they are older or equal to the age of 55. LF age, it is older
or equal to 55, then we're going to have
55 plus. That's it. Now we have covered
all the groups that we have inside our data. Let's go and date, of course, right. Everything is valid. Let's go and K. And with that we have
now a new dimension, and which is on the top
over here, age groups. Let's go and put it in the output in order
to check the results. What else I'm going to
do in order to test, Let's show it as a filter, and let's start with the
youngest generation, the employees where they
are younger than 25. Now as you can see,
all those ages is less than 25,
which is correct. Let's move to the
last one as well, to the oldest employees
over here, as you can see, they are all other
than 55 or equal. So, as you can see, it
is as well working. Let's check another
one over here. So employees 35-44, and
everything looks nice. Let's check this one 25-34. That you can see
everything looks perfect. Now let's go back to our
sheets, age versus educations. Let's get first the age
groups to the columns, and then let's get the
education levels to the rows. Now we have our matrix, but it is not sorted correctly, so let's go and sort
those dimensions. Right click on the age groups,
and let's go to sorts. Now the next in order
to have a heat map, let's go and change it
from Pi to circles, nothing at a change just to make sure we are not
talking about Pi. Now of course, what controls those circles is the
number of employees. Let's go and get the
total hired to the size. Now we have our heat
map, but as you can see, those dimensions are not sorted correctly. Let's go and sort it. Let's go to the age group right click on it and go to sort, and then we want to
sort it manually. The first is the youngest group, then 25, 35, so it looks
good, let's close it. The same thing for
the education level, let's go and sort it as well. As well, Manual. From education, we're going to start
with the high school, the Bachelor, master, and PhD. Now it looks better.
Let's go and close it. Now from designs, we don't
have any exits or anything. I will just go and
change the colors because I would like to decide
later on the dashboard. I would say let's go with
the gray. Let's go and hit. Of course, don't forget
about this legend, let's remove it, so hide it. Check the data. It's
very interesting. You have the most
employees in the category 35-44 as an age group, and most of them have the pasar. So with that, we can go and
analyze the coloration and relationship between
the age groups and the education levels
of the employees. Let's move to the
next one and it says, show the total number of
employees within each age group. Again, here we have the
comparison analysis in order to compare the
values within category, as usual, the par
chart is the best one. Let's go and build it as usual, duplicate one of those charts, Let's rename it to age groups. This one is going
to be very simple, so we need the age groups, but we don't need
the education level. Let's go and remove
the sizes as well. We need the total hid as a rose, and instead of
circle, we need pars. That's it. It's very
simple and as well. It's already sorted because I've duplicated the previous one. The sorting of the
age group is correct. Let's go and hide.
This axis over here, and that sets for
this requirement. Let's jump to the next
one. It's very similar. It says, show the total number of employees within
each education level. So we're going to go
with the same visual, departure in order to compare the different values
within a category. All right. So we're going
to do the same stuff. Let's go and duplicate
this one over here, and let's call it
education levels, and we have to go and
replace this dimension with the education level
so instead of age groups. We're going to have it
like this. But of course, we have lost the sorting
of this dimension. Let's go and sort it again. So let's go sort, and it's going to be a Manual. And the high school is first, Bachelor Master PHD,
which is correct. So again, part charts
are really easy. Okay, let's move to
the last requirement, and this section as it says, present the correlation between employees education levels
and their performance rating. So for this requirement, we're going to go again
with the heat map, since we have to show
the relationship between two dimensions,
two categories. Okay, so let's build
another heat map. So as usual, we're going
to go and duplicate stuff, and we're going to rename it two education
versus performance. So of course, the
first question, do we have all
those informations? Yes, we have the performance
and as well, the education. So we don't have to go and
create any calculated fields. So we need the two dimensions. The education, we have
it already over here. Let's go and get the
performance rating, and let's check the marks from parts to maybe
squares like this. And let's go and get that total hied to the size. All right. So now by checking the data, we have to go and sort, I think the performance.
It's not correct. Let's go and sorted
again as a manual. It starts with excellent
good and then satisfactory. That means we're
going to have it a step above needs improvement. That looks good. Let's
go and close it. Now as you can see, we
have the highest group is between bachelor and good, which is okay because
we have a lot of employees having the Pahlar
compared to the PhD. Instead of having the
absolute numbers, let's go and get instead
of that the percentage, which is going to show
declaration more accurate. Instead of having the total
hired, I will just remove it. Let's go get this
total percentage. From higher to the size. Now the percentage doesn't
make really a lot of sense because here
we have 72%, 65%. I think this is cross table, so let's go to the measure
over here at click on it, compute using n table across. So instead of that,
let's go and change the calculation to
performance rating. Because we are focusing
on the performance, let's go and click on that. Now it looks more
accurate if you go, for example, to the employees
with PHD, as you can see, 48% of them having excellent rating, and
then the next one, we have good satisfy
and as well, the last one needs
improvement, only 5%. As you can see, the highest
group of employees with PHD, having the excellent rating. Let's go now and check
the high school. Here we can see this group is smaller compared to the PhD. We have only 13% of employees
with high school education, having an excellent where
we see here a big pupple, where we have 34% of employees with high school
that needs improvement. We can understand from this data that is generated from AI, that there is
correlation between the education level and
the performance rating. The high education level might enhance and increase
the performance rating. But of course, this
is not a rule, it depends on a
lot of stuff like the field of work, the
skills, and so on. Not only the education level going to improve
the performance, but in this data, we can
see there is a clation. Of course, one more
thing before we close, we have to go and hide
the legend right. With that, we are done
with this requirement. All right, friends,
let's move to the third section and we
have the income analyzers. So in this section,
we're going to focus on the salary
based matrix, and we have here
two requirements. First requirement says, compare the salaries across
different education levels for both genders to identify any discrepancies
or patterns. In this requirement,
we want to see the differences in salary
between the different genders. This is not only correlation, we are talking as
well about something called Gap analysis, and the Bs chart, the visual the gap analysis is
the parple charts. This is exactly why I go
with the parble chart instead of the heat map
because with the parple chart, I can very clearly and easily show the distance
between values. And as well, we can
show the correlation between two different
dimensions and categories. For this requirement, I will
not go with the Hat Map, since I cannot show the
distance between values, I will go with the
purple charts. Okay, so let's build a
purple chart in Tableau. We're going to go and
duplicate stuff as usual, and let's give it a name. It's going to be gender
versus education level. So that sets and let's go and
clean everything from here. But we're going to still
need the education level as a rose because we have it
already sorted correctly. What is a parable chart? It contains two points and the distance between
them as a line. So we need two charts,
one for the line, and another one for the points.
Let's go and create it. We need the salary information. So as you can see, we
have it over here. Let's go and drop
it to the columns, and we don't need
the sum of salaries. We need the average salary, let's go and change
the calculation of the measure from sum to average. Since we need two charts,
we need two measures, and we are using
the same measure, so let's go hold control
and duplicate it. What does we have two charts. As we said before,
one going to be align and another going to
be point data points. Let's start with the first one. Let's go over here and change
it from square to a line. Now since we want to show the distance between
the gender values, we need to go and get
the gender informations and put it to the path. What does we got like the lines, the distance, the
gap between points? Let's go and make it bigger in order to see those
informations to the max. So now let's move to the next
one where we're going to configure the points
of the genders right. Let's go to the second
mark over here. Instead of square, let's
go and get the shapes. Now for the shapes,
we're going to have the gender informations. Let's go and drag and drop
the gender to the shapes. Now as you can see, we
have our two genders, but I think we have
better shapes for that. Let's go to the shapes. Instead of default, let's
go over here and we have already from
tableau gender shapes. Let's go over here. That's it. Let's hit k. As you can see we have those signs,
but they are really dark. Let's go and get as well
the gender to the colors, so hold control and
put it to the colors. As you can see on
the right side, we have now those symbols,
but they are really small. Let's go and change
the size of that, something like maybe to
the middle. As like this. Now the next s that,
we're going to go and put everything in one chart.
Now they are splitted. Let's go to one of those and use the dual axis and make sure that we synchronize
the axis as well. Now we still have here a huge
space where it's not used. Let's go and configure the axis, dit axis and make sure
to remove include zeros. That's it. Now it
looks really nice. Now, of course,
we can go and add a label for the average sales. Let's go over here,
and let's get the average sales hold control
and put it to the labels. It's not really clear, so let's
go and change the phones. Let's go to label
and go inside it. Let's go and use
our second gray. Let's get the light gray. Okay. Now we can see the
numbers are really big, Let's go and change the
format of the salary. So right click on it
and go to format. Let's go to the
numbers over here, and as well to the
custom number. Let's go and remove
the decimals, and now the display
units can be thousands. I'm still not happy about
the symbols and the text. Let's go to the labels
and change the alignment. Currently, it's middle center. Let's go and change it to
automatic. It's way better. With that, we have the symbols and as well the
numbers beside it. Of course, don't forget
about the final touch. Let's go and remove
all those headers from top and Patton. Let's not forget
about the legends. Let's go and remove it. And now we have very
clean charts. All right. So now let's understand the
result of this insights. As you can see the
average salary of male and female with
high school education, they are relative equal right. But now if you go and
check the bachelor, you can see the
average sales for male is way higher than female. As you can see, the Pabl
chart is really amazing. You can see immediately the gap, the distance between
those two values. The males are getting
way more salaries than the female with the
education level of Bach. Let's go and check another
huge distance between the genders if you check
the education level PD. As you can see, we have a huge distance
gap between the genders. But this time is the way round. On average, the female doctors are earning around like 25%, more than male doctors. As you can see, the Public chart is amazing in order
to understand the distance and the gap between data points and as well to
have coloration analyzes. This is amazing visual and that's all for
this requirement. Friends, now we're going to move to the second requirement of the income analysis and the last requirement
in the sum review, and it says, present how the age create with the salary for
employees in each department. This time we want
to show the cation, the relationship
between two measures, not two dimensions, like
the at Map, two measures. Of course, the best type of chart here is the scatter plot. The scatter plot is amazing in order to show the correlation
between measures. All right, now
let's go and build a scatter plot in tableau. As usual, we're going to go
and duplicate the sheets, and we're going to rename
it to age versus salary. So do we have those
informations in our data? Well, yes, we have
the ge celery. We don't have to create
any calculated fields. Let's go and clean up
those informations. Let's remove everything. We don't need all those stuff. So now let's start
from the scratch. Since it's corration
between two measures, we have to go and add
our two measures. The first one going
to be the celery. Let's go and drop
it to the rows, and we need the ages. So let's go and drop
it to the columns. Of course, we don't
need the summarization of salary and ages. We need the average.
Let's go and change that. Let's go and changes from
summary to average and the same for the age
from sum to average. Great. Now we got our two xs, our two measures and make sure that we are using
the marks of shapes. We got it from the
previous charts. Know what is missing, we
need the data points, and it's going to
be the job title. Let's go and get the job title
and put it on the details. Now as you can see, we
got our data points, but we have here
huge wasted space, and that's because we are
including the zero in the xs. Let's go and clean
that up, it xs, I remove the zero and the
same thing for the average. Add the axis and
remove zero like this. Now's say let's go
and change the shape. Instead of circle, let's get it a filled Damont like this. Now sometimes we have
overlapping between points. It would be nice if we reduce the opacity to
something like 75. Now let's go and add labels
for those data points, and it's going to
be the job title. Hold control job
title to the labels. Now let's go and reduce maybe the font size 9-8,
something like that. Now, of course, in order to get the effect of scatter blots, let's go and add reference
lines for both of the axis. Let's go to the
salary over here, right click on it, and
let's add a reference line. So let's go and check
the informations. Average lines, let's
remove the label, and maybe we can have custom
tooltip like this average. And let's go and
insert the value. So now let's go and format it. It's going to be dashed
one, a thin one, and let's use our
gray color like this. So that's it, Let's okay. And with that we have a
very thin average line. Let's do the same for the ages. So add reference line. So no label, and let's
add a tool tip like this. Average. And the value and
the same format for the line, is going to be dashed one thin and as well our gray color. So, that's it. That's it, okay. So what we have created a
really nice scatter plot. So now if you check the jobs like most of them are managers, right, we have the IT manager, finance manager, HR, and so on. So most of them are managers, but we have three types of jobs that are
getting high salary, but they are not managers
like software developer, and we have here
system administrator and finance analyst. As you can see below the line, we have different types of jobs, but none of them are managers. It makes sense, of course,
managers are getting higher salary than
the other jobs, but still there's some jobs that are getting high in salary. Now we are just checking the
salary, only one measure. Now, let's check the
coloration between the age and summary,
thinking about two things. Now if you take a look
back, we have a group of jobs that are centralized in
the middle, which is okay. But here we have extremes like the HR manager and
the finance manager. HR managers are
getting high salary, even though they are
young employees. And as well, it is the
only manager group that having young age. If you compared to the
other manager jobs, they are like around 40. So this is one
extreme in the data. So now let's go and check
the way on top to the right. We have the finance managers. So they are getting on average the highest salaries
inside our data, and as well, the average
age is relative old. So this is one extreme. And as you can see, we
have another position the IT manager is as well like moving toward
this direction right. So, my friends,
this is what we can understand from our data
from the scatter blots, and that's all for this
require All right, friends. So with that, we
have covered all the requirements for the
first dashboard, the summary dashboard, and
we built as well the charts. And after that, we have
to go and put everything, all those charts in one single consolidated
table dashboard.
203. HR Project | Sketch Mockup of Summary Dashboard: All right, Sara, we're
going to go and build the summary dashboard and
here what we're going to do. First, we have to create a plan, where we're going to go and
sketch out the mockups for the dashboard and
the containers in order to have a plan
for the layout. And after that, we're
going to go and create the container
structure of the dashboard in order to put all those charts in
one single view. And after we have all
the charts in one place, we will start with the refining
and fine tuning process. So we're going to
go and tweak and twist a lot of stuff
like the text, colors, icons, legends, filters to get everything
looking just right. So are you ready, let's start with the first
step where we're going to go and plan the dashboard
for the summary view. A. For this project, I have decided to
have around 15 charts in one single dashboard. It is definitely a challenge,
but don't worry about it. We can do it step by step. Now, of course, we'll not
jump immediately by creating the dashboard because we will
struggle without a plan. Any professional in any
project knows that. Before building anything,
we have to have a plan. We have to have a blueprint. And of course, we want to
be professionals right. That's why we have to go
and plan the dashboard by sketching the of the container
end of the dashboards. So of course, the question is,
how are we going to do it? Of course, you can
go old style by just having a pin and paper, and you can go and draw the
sketch of the dashboard. Can go and use digital tools like, for example, PowerPoint, or like I'm doing here,
procreate using my tablets, or you can go and use
tools like Figma or DO. So any tools that helps you to design and to sketch the
mockup of your dashboard, that suits your fancy. So let's go and sketch the
mocap of our dashboard. The background is
going to be dark gray, and that's because we
are making a dark theme. So now we can have the
usual stuff where we have a title for the dashboard,
human resources dashboard. In their summary requirements,
we have three sections, and that's why we're going
to go now and divide our dashboard into
three main sections. We have overview,
demographics, and income. Now let's focus on
the overview and put everything that is required
in this one section. We're going to start with
the pig numbers, the bands. The first one is going to
be the active employees, and here we have a big number, and then we're going to
split it into two sections. The left side going to
be the hired employees, and to the right side,
we're going to have another big number for
the terminated employees. Now in order to have the effect of the KPI, what
we're going to do, we're going to put
the line charts exactly below those big numbers. Now below it, we're
going to have another section for
the department. We're going to have
our ranking of the departments using
the par charts. Then below it,
we're going to have the last section
in the overview. We have the location.
Here we have two charts. We have the one with the part chart where
we show the number of mploye in the HQ
and the branches, and the other charts
here, we have a map. We're going to put
the maps and the part charts side by side
in this subsection. As you can see, it's not really easy to fit everything
in one place. So that's all for the overview. Now, let's go to the
right section to the demographics and here
we have a big challenge. Have to fit in this section
five different charts. The first section is
about the gender, so we have our Pi charts. But now for the age
and educations, we have two separate par charts. What we can do here,
we can integrate all those three
charts in one block. In the center, we can
have the heat map, but on the top and
end to the right, we can have those par charts. With that, we have
all those three charts in one subsection. Now to the right side
to the last section, we're going to have
the performance and educations and here we
have another heat map. Let's move to the last section
to the income analysis. It's pretty easy. We have
here only two charts. The first one, the
gender and education, we can have it on the left side, and to the right
side, we're going to have here our scatter blot, the H versus salary. With us, as you can
see, in one dashboard, we are showing almost
15 different charts. Of course, in our dashboard, we have to have a section on
the left side for the logos, for the navigations,
between the two dashboard, the summary, and
the detailed views. Of course, we can go and add multiple
functionalities about exporting the dashboards or icons where we can
put our links. We will not forget
about the filters, so on the top right, we can have like
a switch in order to show the filters
or to hide it. All right, friends,
to the next step. Now we are not done
planning our dashboard. We have to go and
sketch the mockup of the container structure. Building a dashboard in
tableau requires a knowledge about how to control and
manage the containers. If you don't have
plan, I promise you things can get chaotic. That's why we have to bland
the container structure, and this time I'm
going to sketch the mocap using the DAO. DroO is amazing tool and as
well free in order to create professional charts and concepts that I usually do as
well in my projects. Okay, so now we are inside DO, and I just put our mocap
as a reference for us, and working with DroAO
is pretty simple. The first step that
I usually do that, I go to the style over here
and make it as a sketch. Now what this does is that
all the shapes that we have on the left side going to
look like hand drawing. So at the end of your
concept going to look really cool and n pouring. Now, for our
containers, we're going to have three different objects. The first one going to be
the horizontal container. So you are the horizontal. Container, and I usually
have the color of plue. Let's first year, remove the
fill and go to the colors. Choose plue and maybe
make it thicker. So this is the first type. The other one, we have a
vertical containers, right. So vertical container, and we're going to have
the color of orange. So maybe something
like this came. And the last box is
going to be our objects. It could be anything.
It could be an icon, it takes an image. So I would like it as Gray. Let's have
something like this. So we can see that
our whole dashboard is split it into two sections, the left sections where we
have the logos and the icons, and then the rest
to the right side. So that means we're
going to start with horizontal container for
the whole dashboards. So we're going to
make it like this. And we're going to have
it like this so big. All right, so let me
just remove the text over here and maybe
give it a text name. This is the whole dashboard. This is the first step.
Now let's start with the left one where we have
the icons and the logos. It's like a vertical, we have all objects below each others. What you're going to
take, we're going to take a vertical container
for the left side. We're going to call it Nav
for navigation like this, and let's make it a
little bit smaller. Inside, we're going to
have different objects like a logo. Let's
make it smaller. I will go and make
a feel for that, so let's click on fel
and gray, same here. Now we can zoom in and add more icons in order to
navigate between dashboard, to explore the dashboard, to put links, and so on. So we're going to
have multiple links and stuff on the navigation. This is everything
about the navigation. Now, on the right
side, what do we have? So we have first like
a title a filter, and then below it, we have
a whole section of charts. That's means we have two
objects below each other, and for that, we're
going to need again a vertical container. For the whole thing over here, we're going to have one big
vertical container like this, and we're going
to call it header and charts header and charts. Okay, something like this. Now let's start with the header. It looks like we have a header and beside it, we have filters. That's why we're going to go with horizontal container right. We're going to have
it like this and what do we have inside it? We have the header
and the filter right. So we have the title. And here on the right side, we're going to have a few icons or maybe one icon we will see. Now let's have a look to
our charts over here. Here we have three
sections right, but actually they are
splitted into two sides. The lift sides where we have the overview and the right side, where we have two sections. That means we have two
object side by side, and for that, we're
going to take another horizontal container. Let's do it like this.
It's going to be the main splitter between the lift side and
the right side. Let's start with the lift side. As you can see, they are
object beneath each others, and that means we're going to go and use a vertical container. For the lift side, we're going to have a vertical
container like this. Let me just remove
the name and let's go and call it the overview. Overview, and we have inside
overview, a lot of charts. We can have multiple charts like this and all of them are
beneath each other's. We will not now drill
down inside each detail. We will just have a rough
plan for the containers. Now let's check the right side. Now on the right
side, as you can see, we have two main sections, we have the demographics
and the income. That's means we're
going to go and have a vertical container. As well. The right side, we can have
vertical one like this, and we're going to
remove the name here. Now let's go and
check each side. As you can see, we have first
like a title and below it, we have different objects. Again, here we have a
horizontal container. We're going to have like this. It is very nested because it's
a little bit complicated. We're going to have
as well for the below section for the income. We're going to have a title and then charts. Let's
give it a name. This is the demographics, and below it, we
have the same thing. We have a section
for the income. What do we have
underneath that title, we have here like
charts side by side. That's means we can go and use horizontal container
for that right. We're going to have
horizontal container below it like this and inside it, we have our different charts. We have charts like this,
let's have three like this. For the income as well, we're going to have only two charts, we're going to need as well
a horizontal container since they are
object side by side, and we can have our two charts. All right, guys. I think
we have a plan, right, so we have a blueprint
for our dashboards, and we have a lot of layers
like around six layers. We will not find you now, the plan, is it
just a rough plan. But one thing that I
would like maybe to zoom in a little bit is
about each chart. So as you can see, for
example, this one, we have a title always
and below it a chart. The same thing goes
for the gender, we have a title and a chart. That's means we have a vertical
container for each chart. If we go and zoom in
inside those charts, we will not place
immediately the charts. We're going to have it always
as a vertical like this, where the first
object is going to be the title of the charts. So like this and below it, then we can have
that chart itself. All right, my friends. So
now we have a rough plan. So now let's go and implement those containers in Tableau. Alright, friends. So finally, we have now a rough
plan for our dashboard. But of course, it doesn't
contain all the details, so we will be like twisting and tweaking stuff as we are
building the dashboard. So let's go back to Tableau in order to build the dashboard.
204. HR Project | Build the Summary Dashboard: Okay, friends, let's go and
create a new dashboard and let's call it HR summary. Like this. Now, the
first step of that, we're going to go and define
the size of the dashboard. So let's go over here
on the left side. Instead of range, let's go
and select a fixed size, and this time we'll
go with that with 1,400 and the height
of 800. All right. So let's start with
the first container. It is the horizontal container
for the whole dashboard. What I usually do, I go over here and switch it to floating, because having everything
in one floating container, it adds more dynamic and we can go and change the
background as we want. Make sure to switch
it to floating, let's take the
horizontal container and drop it in the middle. As you can see, it's
a little bit small. What we can do, we
going to go and change the size of it in order
to fit our dashboard. Let's go to layout, and the widths going to be
exactly like the dashboard, 1,400, and the 800
for the height. For the position, it's
going to be zero, zero. Order to have it exactly
on top of our dashboard. Now in this phase
as we are adding the structure of our containers, I usually go and add borders
to each container in order to see whether
we are doing everything correctly. Now
let's go and do that. Let's go to the borders and add a line, thick one and plu. With that, we can see a
Plue horizontal container. Of course, let's go
and give it the name, so let's rename it
to hold dashward. Okay. Now in order
to avoid mistakes by converting the
horizontal container to a vertical container. I go and add planks inside it in order to make it as a fixed
horizontal container. Let's go and do
that, two dashward, and now let's switch
it back to tilt. Only the first main container going to be floating, the
risk going to be tilted. The first plank to the middle. Now make sure that the second blank exactly on the right side. Let's go back and check
in the lie outut. You can see we have planks
inside our whole dashward. Now let's go to the
next level and start adding the containers
inside the whole dashward, and here we have two
vertical containers. One for the Navy,
let's go and do that. We can have one vertical
container over here. As usual, I go and
add planks inside it. Let's go and add
the first plank. It's a little bit
small like this. Let's go and expand it. Let's go and add another
one plank below it. Make sure it's below
the first plank. Let's go and check the layout. Now as you can see, we
have a vertical container and two blanks inside
it, which is correct. And let's go name. Let's
give it a name of Nav, and we can go and remove
the first plank over here. We don't need it anymore,
so let's remove it. Of course, we can go and
add a border color for it. This time's going to be orange. This is the container
for the Nav. Now let's go and add another one for the right side for the rest. So, let's have a
vertical container and two planks inside it, one in the middle, and
one exactly below it. Now it's very small.
Let's go and chick the vertical container and
make it wider like this. Let's give it a name now. It's going to be header
and charts. So click. Of course, we're going to
go and give it a color like this and it's going
to be as well and orange. Now if you're looking
to the tree over here, we have a whole
dashboard and inside it, we have the nav and
to the right side, we have the header and charts. Let's go and remove this plank. We don't need it
anymore. From here. We will not now
focus on the Nav, since we don't have
a lot of containers, we have here only logos
and icons and so on. We will focus now on the
header en charts because here we have the
real content and we have a lot of containers. What do you have inside it?
We have two containers, one for the header, and another
one for the whole charts, and both of them are
horizontal containers. Let's start with the header, so we're going to go and
add horizontal container. In the middle. This time instead
of adding blanks, we're going to add one text for the title of the dashboard. It's going to be human
resources, dashboards. Let's add the word overview. Let's have it like this, and
let's have the size of 20. Now we're going to go and add
a blank to the right sides. Make sure you drop it exactly to the right side inside
this container. Let's go to the layout and
check what do we have. As you can see, we
have now a text and blank underneath the
horizontal container. Let's go and give
it the name now. This is the header,
and of course, we're going to go
and add a color for it, it's going to be the blue. Now we can go and remove
this upper plank. Like this. Now let's go and add another
container for the charts. So it can be as well
horizontal container, so drop its beneath it. As usual, we're going to go
and add our blanks. One here. Let's make it bigger and
one to the right side. And we go to the layout
and check stuff. We have two blanks inside
the horizontal container. Now let's go and
give it the name. Here we have everything, the
lift and right sections. Okay, and we're going to go
and add the borders as usual. So with that, we have
our two containers, and we can go and remove
this place holder from here. Now, let's keep drilling down and we're going to focus
on this container, the left and right sections, and here we have two
vertical containers. So let's start with the left
section, the lift container. We're going to have
it for the overview, so vertical container. And now let's drop
a text instead of blank and call it overview. And maybe let's make it like 12. Now below it another blank in order to make sure this
is a vertical container. Let's go to the
layout and check. Vertical to container,
we have title blank, and let's give it the name over view left section like this. Let's go and remove
this plank from our dashboard and don't forget about the
color of the porer. We can have it orange. That sets, let's make it a
little bit smaller like this. Now let's go to the right side, and we can have as well a
vertical container, like this, the same stuff, a plank and below it as well another plank, and we go to the
layout the same stuff. We have two planks and
let's give it a name, demo and income sections. As usual, the pder,
as we orange, and we're going to go now and remove the place
holder like this. Let's adjust the sides,
so the left section, the overview, should
be smaller like this, and then we have
the right section. With that, we have
everything on the left side. What is left is designing the containers of
those two sections. Here we have two
vertical containers. Let's go and do
that. The first one, We're going to drop it
here in the middle. Let's go and add it text for it. It's going to be
the demographics, and the size is going to be 12. Okay. Now let's make it bigger like this. Let's drop a blank. Make sure to drop
it exactly here, and let's go to the layout
and everything is fine, as you can see, I'm just
beak a little bit thicker. We have here the
text and the blank. Let's go now and give it a name. It's going to be
the demo section. Like this, and we're going
to give it as well a color. As well, a vertical container. Let's go and remove
this, placeholder, and we need to do
the exact same thing for the second section. Let's go and add a vertical
container, a text, going to be the income, 12, and we're going to make
it bigger like this. We're going to bring
as well a blank. Make sure to drop it
inside the container. Let's check the layout,
so everything is fine. Now we're going to
go and rename it as usual. Income section. Don't forget the
coloring like this. And with that, we are done. Let's go and remove
the last plank. Here we still have spacing. Let's go and adjust the size, so the demo going to be the middle and the income going to take as well
the whole space. Okay, guys, I promise
you the last drill down, where we're going
to add a horizontal container for the charts. For the demographics,
we're going to have one horizontal
container here inside. Let's go and add a
few planks inside it. The first plank small
and to the right side. So let's go and check that. We have horizontal container, give it a border color. Now we're going to
go and do the exact same thing for the income. We need as well
horizontal container inside it and two planks. On here. Let me just
make it bigger, and one exactly to
the right side. And we're going to
check the stuff. We have two planks inside the horizontal container,
give it a name. Income charts like
this, give it a color. And remove the placeholder. So let's go to remove it. Okay, friends, so we are done. Let's go and have a final
check on the structure. We have a whole
dashboard and inside it, we have the lift
section for the Nav, the right section for
everything header and charts, and inside it, we have two horizontal
containers, one for the header,
and another one for the lift and right sections. Let's drill down. We
can see here we have the lift section as a
vertical container, and then we have a right section for the demo and
income sections, and then we go and split it into demo section
and income section, and each one of them has a title and as well
horizontal container. The same thing as
well for the income So if you have it
like this exactly like me, we can proceed. If not, then go back
and do it step by step. Okay. Now the next step
that we're going to do the first iteration
in the dashboard, where we're going to
put all the charts inside our dashboard. We will not care a lot
about the designs. It's all about placing the
charts inside the containers. So let's start with the first
section in the overview, so make sure to select it. And I'm going to say, let's
make it a little bit bigger. So we're going to start
from top to down. We're going to go to dashboard, and let's go and add a title. For the first pan,
it's going to be the active employees,
so active employees. And let's centralize
it in the middle. Now below this title, we're going to have the
pan off active. Let's drop this chart below it. Of course, we're going to
go and hide the title. We don't need it.
Nice. Now below it, we can have two KBIs, the left and right,
and for that, we need horizontal container. But before that, we're
going to go and have a small separator between this pan and the
two bands below. We're going to have
a blank below it. Let's go and make it
smaller like this, and we're going to go and
design the following stuff. Let's go to the
background, or colors, Pick our gray and make the
opacity something around 60. All right. W we think, let's go and remove the outer budding 20. And we're going to go and
give it the name divider. All right. All
right. So below it, we're going to have a
horizontal container for the two KPIs. Drag and drop below
it like this. As usual, we're going to
go and add our two planks, one, and the second one, make sure it's going to be
exactly to the right side. So let's go to the
layout and check. So here we have the
horizontal containers. Let's go and call it. We're going to call it
QBI section like this. Of course, we're
going to go and add few borders for it
just to see it. All right. As you can see
now, things are smashed. Let's go reorganize it. We're going to make
this new container a little bit bigger like this. Now let's focus on
those two KPIs. Now what do we
need for each QBI? We need a title ban
and a line charts. So we have to have a
vertical container. So let's go and grab one
and put it inside it. Let's start immediately
adding stuff, so we need a text. It's going to be the hired
and make it to the center. Below it, we need the pan, drag and drop the pan, course, make sure to remove
and hide the title. Below that, we need
the line charts. It is hired by year and drop
it exactly below the pan. And we hide the title. Now this is the first container. Let's go and check the layout. We have here,
vertical container, we have the title, pan, and as well the line charts. Let's go and give it the
name and be hired BI. Like this, let's go and remove the first place holder from
the plank. So remove it. Now, don't worry about the
size and the coloring. We're going to do a
second iteration on the dashboard in order
to do fine tuning. Now we can just a
little bit adjust the side from the
line chart like this. Now we need in the
right side, again, the same KBI, the same steps. Let's go and grab a vertical
container to the right side, make sure to drop it
inside the container, and we need a text. It's going to be
terminated in the center. So what do we need else, we
need a pan so make sure it is exactly below the text and
as well, hide the title. Let's go and this small zone to this container,
go to the left side. And as well, the blank
should be smaller. Now what do we need, we
need the line chart. So let's go and drop the
line chart below the pan. Remove the title and make
it a little bit smaller. Now let's go and
check the layout. So we have a vertical container. We have a title, pan, and
as well, another chart. Let's go and rename it. This is the term KPI. Okay. Now one more thing, I would like to go
to the this blank, rename it to divider. Like this, Let's give
it the same coloring. It's going to be the dark gray and as well the
pity 260 like this. Let's go remove
the outer padding. Now what do we have
below that? We have the department and like
lines lift and right. For that, we need a
horizontal container. What do we need? We need
a text in the middle. I' going to be departments, and it should be in the
middle andft and right, we're going to go
and add a planks. Make sure to drop it
exactly to the lift. And exactly to the right. Let's go and check the layout. We have here, er container,
blank, department blank. So let's go and color those
stuff in order to see it. It's going to be the d gray and 60 without any outer bodying, the same thing for the next one. 60 and no ao padding. We can go and call
it department title. Now what do we have below it? We have the chart
of the department. Let's go and drop it
beneath it, and of course, go and remove the
title like this. Now below that, we can
have the location title, so it can be exactly
like the departments. What do we need? We need horizontal
container. We need a text. Let's call it location like this and centralize
it in the middle. We need two blanks
lift and rights, like this, and we
go to the layouts. We have plank location plank, and we can rename it
to location title. And we're going to
design those planks, so make it gray, 60, and remove the padding. The same thing for the next one, as well, 60, remove the padding. Now, below that, we
have two charts, one, a map, and another
one, a bar charts. What do we need? We need
horizontal container below it, and we need the two charts. Let's get the location to the right side,
remove the title. Let's go and get
the maps exactly to the left side, and
remove the titles. Now let's go and check
what we have done. We have horizontal container
and the two charts. Let's go rename it, can
be the location charts. And now we can go and
remove the last plank. It's just a placeholder,
so remove it. That's it, we have
now all the stuff inside the overview section. As you can see if
you don't do it slowly and step by step, planning, everything,
this can be cows. But with the planning,
everything going to be easy. Now let's move to another
section to that demographics. Here we have a lot of charts.
Let's do it step by step. We are this section over
here. What do we have? We have a title, and then we have multiple charts
side by side. As usual, each chart
is a vertical, we have a title, and as
well the chart itself. Let's go and add the first
vertical container over here, and then we need
inside it a text. So make sure to drop it here. This going to be the gender. And center. And below
it, we need the charts. Let's go and pick our Pi
chart for the gender, drag and drop it beneath it. Of course, we're going
to go and remove the title. A great. Now before we go
to the next chart, we're going to go and
have a divider like this. Let's go and give it the colors. Gray, 60 like this and The outer pudding. Now
to the next charts, we need as well a vertical
container to the right side, make sure to draw it
right to the divider, and here we need three charts. Let's do it step by step.
First, we need the title. It's going to be education
and H to the center as well. Below it we have the
first bar chart, which is the H groups. So drag and drow it beneath the title and remove
the title as well. Now beneath it, there is
two charts, the heat map, and as well, the bar
chart of the education. Since they are side by side, we're going to go and get
horizontal container beneath. So drop presenter container
exactly beneath it. So now things are
getting resized, left or right, and so on,
don't worry about it. The main thing does,
we are placing the charts in the
right container. So let's go and get first H
versus education and put it. In this new container,
remove that title, and now to the right side, we need the education levels, so make sure to place it to the right side and remove
as well the title. So now let's go and
resize this divider in order to have a little
bit space. Like this. Now we have to change
a few stuff with those part charts like
hiding the headers. For example, click
on the first one, right click on the
header and remove it. Now for the second chart, I would like to switch stuff. So let's go inside this chart
by clicking to this arrow. Now I'm going to go and
switch columns rows, and as well, we're going
to go and hide the header. Let's remove it and we have
to go back to our dashboard. So we're going to stay
with this, but we will configure it later on
the second iteration. Now let's have a look to the
layout in order to make sure that everything is
correct. So let's see. This is the vertical container
for the education and age. Let's go and rename it. Education and age
charts like this. It should has a title then the first chart where
we have the part chart, and then plod, we have
horizontal charts, where we have two
charts side by side, the at Mm and the part charts. If we get it like this,
then we can proceed. So now we need another
chart to the right side, where we have the last
chart in this section, but we need a divider
between them. So let's go and get a plank and drag and drop it exactly
to the right side. So make sure you
drop it correctly. So let's go and
check the layout. We have the color of
gray and as well 60, and the outer budding to zero. Now as you can see, our plank is after the education
and age charts. So let's go and rename it. If either, and as usual,
we need a container, so it's going to be a vertical container
to the right side, and we need a text. It's going to be education and performance like
this in the middle. And this is going
to be very simple. We're going to go
and get the chart just below it like
this, remove the title. Of course, you can
go and make the divider a little bit
smaller left and right. Okay, let's check again the layout, whether
everything is fine. So we have a vertical
container for the last chart, we have a title and beneath
it, we have the charts. Okay, we are done
with this section. Now, let's move to
the last section to the income. So what
do we have over here? Let me just close this and as well this,
we have the income. So we have a title and beneath
it, we have a container. We need here two
charts as usual. We have the vertical container for the first one,
and we need a title. So let's go and drop
a text inside it. It's going to be
education and gender. Make it in the middle.
Now we need our charts. Let's go and drop it beneath
the title. Remove the title. Now before we go
to the next chart, we need a separator or divider. Let's just design it as usual. To 60 and the padding to zero. Now we need to build
the last charts. As usual, we get a vertical
container to the right side. We need a title.
It's going to be age versus celery to the middle. Okay. And of course,
we need our chart. So let's go and
drop it beneath it. Remove the title and make the
divider smaller like this. Okay, so that's it
for this section, and now we have all
our charts inside our containers as we
planned. All right, friends. So with that we
have all the charts in one place in one dashboard, now we're going to start
with the process of refining and find unit
of the dashboard, where we're going to
go and tweak and twist many stuff in order to have
a professional dashboard.
205. HR Project | Fine Tuning The Summary Dashboard : Right, friends. So
with that we have all the charts in one
place in one dashboard. Now we're going to start
with the process of refining and find uni
of the dashboard, where we're going to
go and tweak and twist many stuff in order to have
a professional dashboard. Okay, so now, the
first step of that, we're going to go and
add background colors to the dashboard as containers, and we're going to go and remove all the background colors of the worksheets. Let's
go and do that. We're going to start
first with the whole dashboards over here. So let's go and
add the following. It's going to be
like a dark gray. So I will go with
this one over here. So we have the background, a dark gray, and then the
section is going to be black. So let's go to the next step. We're going to go to
the navy over here. So thenav going to
be its own section. That's why we're going to
have it as a black like this, and then to the right side, we will not have
everything as black, we'll have only the
three sections overview, demographics, and income. That's why I will not
change anything over here. Let's go to the sections, and we're going to start
with the overview over here. We're going to have
it as a black. Then we need those two sections. We need the demo section, it's going to be as well
plaque and as well, the income section
can be plaque. With that as you can
see, we are getting now the dark theme
of our dashboard. The next se of that,
we're going to remove all the background colors
inside our worksheets. We have added it at
the start in order to have a feeling
about the dark theme, but now we will not use the background colors
of the worksheets, we're going to use
only the dashboards. Now we have a boric task, where we're going to go
through all the sheets and we're going to start
removing the background. Let's start from the top left. We're going to
start with the pan, right click on it
and go to format, and then we go to shading
and we're going to go and remove the worksheet
color. T none. Now we're going to go through all worksheets that we have, and we're going to go remove
the background color. We can do it in the
dashbard here or you can go and visit each of those
sheets one by one. We have the last one.
Remove like this. We are done. Now we have fixed the background colors
of the dashboard and as well the worksheets. All right. Moving on
to the next step, were going to go and fix
the font size and color. Let's start with the
title of our dashboard. Let's select the whole thing, and we're going to go
and use our light gray, and we make sure it is 20, so we have it as 20, and
let's make the first section the title itself as a bolt and we leave
the overview as it is. So that sets it. Now we're going to go and edit the
title of each section. Here we have three
sections, overview, demographics and income, and we're going to
do the following, let's go to the overview. We make it light gray. Like this, and we're going to
make it as we 14 and bold. Let's go to the next one, we're going to do the same stuff. Bold, change the color to
light gray and make it 14, and to the last section. 14 bold and, we pick the color. The sections looks
exactly the same. Now we're going to go and edit
the titles of each charts. We're going to have
the following list start with the agenda over here. We're going to make it
as well light gray, and we're going to make it as 11 for the size of the font. Let's go and do the same
for each one of them. It's going to be 11 light gray. For the next one for the next. 11 for the age and gender. All right. And don't forget about the departments over here. 11 ands gray and the location. And 11. Now we are done
with the titles and stuff. Now, let's go and check the
phone size inside our charts, and I would say we
can make it smaller. We have to go
through that again. Let's start with the department. Go to formats, and instead of nine, let's
have it as eight. Let's go for the index as
well and move it to eight. I would say let's make
it bold all right. Now let's go to this Pi
charts, make it eight, and the same thing for the map, so click somewhere, go to
ft, and make it eight. Now for the Pi chart,
I would go inside its, and we're going to go
to the outer circle. And there we're going
to go and change the font size to eight. But the big number inside, we're going to
leave it as it is. Maybe we're going to make
it little bit even bigger. Let's make it ten. Let's go back to our dashboard, and now we continue
to the next charts. Make everything as eight. Same for the age. Now to the next one, same stuff. And as we eight for the income, and for the ages and stuff. Everything should be eight. I think it looks really nice. We are done now with the font
size and colors. All right. Now the next bit that we're
going to go and visit all the chart again in
order to enhance it, refine it, and maybe
add extra stuff. Now let's have a look to
the departments over here. What we can do,
we can go and add the status of the employee
for each department. We can show as well on this
par the total terminated. In order to do that, let's
go inside the chart again. Now we need like a
status dimension in order to control the
colors inside those bars. We don't have it yet, so that's why we're going to go
and create a new one. Let's call it a status. So it's going to
be the same logic. Let's go and have an F
statement. F is null. The terminated dates, term date, then it is employed. Then the employee is hired. Otherwise, terminated like this. Let's go and end it, and now we're going to
go and take the status and put it to the
color over here. Let's go and assign
the coloring, so the hired going
to be the green and the terminated going to be the pink. Now, what
else I'm going to do? I will just go and switch
between those two status. Let's go and do that.
And I would like as well to show the total
hired inside the label. Let's go and get it, and
we can go and change maybe the color of this
label to light grate, and maybe make it seven, something like this,
and we can make still the index smaller. Let's go back to our charts. Now we can see in
this parts as well, the number of
terminated employees. I would say let's make the
index, little bit smaller. This. That's all for this chart. Let's move to the next one. We're going to go
inside this chart. I would say let's
add the percentage informations to the columns. Let's go and get the total higher and put it
near the location, and then let's go and
switch it to discrete. So that we have the percentages here and the header
information on top. What we're going to do we're
going to go and change the format of those percentages. Let's remove the decimals. Let's go and make those
parts a little bit smaller. I'll go with
something like this. Let's go back and
check the dashboard. They look nice, maybe
we're going to make it smaller size for the font. Instead of nine, we
usually have eight. And we can go and
make it smaller. We have more places for the
map, something like this. Now for the map,
everything looks nice, so we don't have to
change anything. Let's go now to the
gender informations. Now, what we can do, we can make maybe two pie charts
for each gender, and then we can show
the percentage of terminated employees.
Let's go and try that. Maybe it can look nice,
so we can go inside. Now in order to do that,
we need the gender as row. Of course, now, our
bi chart did broke, so let's go to the outer
circle and repair it first. We don't need the
gender information. We have it here as a dimension. What do we need for the colors, we need the status of the
employee, and as well, we need the total hired as percentage and put it on the
Pi. Something like this. What you can do inside those
circles for the big numbers, we can change it to
the percentage right. Let's go and replace
it with a percentage, something like this, and
let's go and format it. So to the percentages and
remove all the decimals. It looks nice right now, we can see the percentage of
terminated for each gender. Let's go and have a
look to our dashboards. Now, it looks that
it needs more space, what we can do, we can go
and rotate the labels first. And with that we
have enough space, maybe you can make it
a little bit bigger. We're going to fix the
spacing between charts later. One more thing that
I just noticed that the inner circle of the bi,
they are naturally black. Let's go to the chart again. To the inner circle
to the colors, and change it to black. Let's go back. That we are done with the gender
chart, as you can see. We are really thinking
again the chart as we see all the informations in
one place in the dashboards. Now we're going to come
to the fun one where we have here three charts
on top of each others. First of all, let's
give it more space like this and maybe make
it a little bit bigger. Now what do we have we have
here four values and for the age we have here
like five values. What we're going
to do first, we're going to give it more space, and I'm thinking about
maybe we're going to go and switch those
two informations. Maybe it's going to
look more better. Let's go again inside the chart. Let's go and flip it like this. Let's go back to our charts. Now it looks more nice, Let me just make this
smaller, something like this. Now we can see that
the high school is taking a lot of space
inside our charts, so we can go and edit
the ES for that, so right click on
it and edit LS. We let's have it like this
as an abbreviation. Okay. So now we have more space. We have to fight with the
space inside this dashboard. So now the next sib
that, I would like to go and highlight
the highest value. So as you can see now we
have everything as gray, and if we highlight now the highest value, it's
going to be very clear. So let's go inside this chart. And now in order to
highlight the highest value, we have to go and create
a new calculated field. So let's give it a
name highlight Max. So we need the function
max but for the window. What is our measure? It is the total hard so the total hid. We are searching for
the highest value. And if the current value equal To the highest value.
We're going to get true. Otherwise, we're
going to get false. Let's go and hit k, and let's use this function
on top of the colors. Now let's go and change
the coloring first. If it is false, it
should be a dark gray. If it's true, we
want it as green. Now if you check the view,
we have multiple values as the highest value. We would like to
have only one value. Let's go and change the
aggregate function, right click on it, and let's go and edit the
table calculation. So now let's go to
specific dimensions and we're going to consider
both of the dimensions, and with that, we
have only one value, which is exactly what we want. Let's go and hide the legend. We don't want it in
the dashboard yet. I would say let's show as
well a label for the highest. Let's go and take that total
hight as a percentage. Put it on the label,
and of course, we're going to go and change
the table calculation. It should consider both
of the dimensions. So let's close it, and we're going to go and
change the format as usual. We don't want all
those decimals. Let's remove it, and let's
go and change the format. What we need, we need
it let's go with seven, and with a light gray. We don't need all the values. We need only the men and max. Switch it from all to men and max and remove the
minimum value, so that we have only
for the highest value this label. I think we are done. Let's go back and
check how it looks like in the dashboards.
It's fine right. Now let's go and fix all those part chart
lefts and rights. We have here switched
the dimensions. That's why we have to go
and switch this as well. Make sure to do it correctly, so we're going to bring it down and the other one should go up. What we're going to
do, we're going to go and switch as well the
dimensions like this. This is for the
first chart and as well for the next
charts like this. Now let's go and highlight
as well, the highest value. Let's go back to this charts. We're going to take the
highlighted value as a color. Of course, we're going to
go and hide the legend as well. Let's remove it. I would say let's go
and reduce the size of those pars in order to
fit inside our charts. I will go something around here. We will see. Let's go
back to our charts, and let's do the same
things for the ages. We're going to go and get the highlight value
to the colors, and we have to go and change
the colors over here, so it's going to be Gray and
true, going to be green. Let's as well remove the
legends and as well, we have to go and
reduce the size of those pars, maybe
something like this. All right. Let's
go back and check. So now as you can see with
the highlight effects, it looks really nice. Now as you can see
the parts are not fitting exactly on
top of those values. We will fix the spacing and the positions later
as the next step. So we can leave it
as it is for now and let's move to
the next charts. So let's go inside it, and I would say let's go and highlight as well those values. Now, we cannot go and use the same highlighter because
here we have percentage, and our highlight is based
on the absolute numbers. So what you can do going
to go and duplicate it. And let's re name
it two percentage. I'll remove the b
as well from it. Let's go and edit it. Now instead of having the
total yard we can have, we can have the percentage
of total hyrod ride. We're going to
take this measure. I remove the
percentage from here. Let's go and copy it and put
it as well for the equation. Hit and let's move
it to the colors. Now, of course, we
have to go and add as well the coloring as usual. False is gray and true, can I be green, and we're going to hide
as well, the lesions. Now let's go and check
the table calculation, whether it is
configured correctly, so dit table calculation. This one should be based on the performance
rating like this. Now I'd say let's go and add
the label for those charts. We're going to take
the same measure, hold control, and put
it on top of the label, and let's go and
adjust the style, so it's going to be light gray. And we're going to
have it as an eight and we don't need
all those values. Let's have only the min and max. Now we have the mean
value and the max value, but I don't want the min value, so we can have only the
max value like this. That sets, let's go
back to our charts, and I think everything
looks nice. Now let's go to the
education versus gender. I think here in the charts, I would not add anything.
It looks really nice. But I would go and change
the size of the labels. We forgot about
that. Let's make it eight instead of nine. So Doch. Now for the last
chart over here, I think we have to go and
add some coloring tots. So I will just go and
add our green color and maybe reduce the opacity to
something like 50, very nice. And maybe go and reduce again, the size of those labels
to something like seven. Now I would like to go and
add for the axis a line. Let's go to format. So let's go to the lines
over here and on the sheets, we're going to go to the axes. And we can add a line for it, and we make sure that we are selecting our dark
gray for that. Maybe as well reduce the opacity to somewhere like
around maybe 60. Let's go back to our charts and maybe let's go and
rename those axis. Instead of average age, we're going to have only the age and the same thing
for the salary. So we're going to
have only the salary like this. That's
it for this chart. As you can see,
we just revisited all the charts and we
added extra stuff, some refinement and fine
tuning. All right, everyone. Now in the next step, we're
going to start working with the pixels in order to add more spacing between all
those sections and containers using the
inner and outer padding. Now the distance between
all those main sections can be always as a 20.
Let's start doing that. To for the left side
from the navigation. Make sure to select the
navigation over here. Now, the first thing
that we're going to go and get rid of all
those porters. We don't need it. Now
we have to add 20 as a space between this section
and the outer dashboard. We're going to go to the
outer bedding over here and just Add 20 everywhere, top left, bottom right. The next step of that,
I'm going to go and do a fixed width for
this container. Let's go to this
small arrow over here and edit the width, and we're going to
have the value of 100. So let's do it like this.
Now, as you can see, we have spacing between the container and the
border of the dashboard. Now let's go to the
right side completely. So let's go and select
headers and charts, remove the border,
we don't need it. So as you can see we have a lot of spaces on the right side, so we're going to go
and edit the width. Instead of this
value, we can have, let's go with 1,300.
Let's go like this. Now if you take the
whole container, we need spacing
from the right side and exactly going to be 20. Let's go to the outer
bedding over here. The select all sides
equally because we have already space between
those two sections. We need only from
the right side 20. Now let's go inside
all those containers and start adjusting stuff. The next s is that's the header. We're going to go and
remove the border, and I would say let's go and have a fixed
height for that, so change it to fixed. And as well, let's
say the fixed two, 65, something like that. We have a little bit spacing between the charts
and the title. I'm happy with that. Now let's go to the next section
to the left and right. We can see here, we have enough spacing around the dashboard
for the whole container. Let's go and remove
the border for that. I would say let's
jump to the next one. Let's go to the overview on the left side.
What do we need here? On the left side, we have a 20, so we are safe on top, on bottom, but on
the right side, we don't have enough space
between the sections. That's why we're going
to go and adjust it. But first, let's go
remove the border, and then we're going to
go to the outer padding and we're going to
remove all sides equal, and on the right
side, I need 20. Now we can see we have enough spacing between the
lift and right. That's look really good for now. I would go as well change the container color of
those informations. So we don't have anything. Now let's go to the right sides and select the whole container. We are at the demo and income
section, remove the border. I think we are done with this. Let's go inside those sections. Let's go to the demo
section, remove the border. Now of course, we
need now spacing between the demographics
and the income. On the bottom, we need 20. Let's go to the outer patting, D select and only a
bottom, we need 20. Looks really nice so far. Of course, let's go and
remove all those borders, so we don't need it anymore. O this as well, we don't need borders and here. I think we have to
go above like this. If ID selects, we
still have one border, which is the whole dashboard. So is just remove it. As
you can see adding spacing, it's like giving air
to your dashboard, so it can breathe. Now we're going to go
and add an inner adding inside those sections. We will ignore for now
deidentifications, because we're going
to have another story about the icons. Now if you check those sections, you can see that the wording is very near to the border
of the section right. We have to give
here some spacing. We will do that only for
the main three sections. We're going to go
first to the overview. Like here, and now this
time we're going to go to the inner budding and we can add a seven,
something like that. You can see as we are
moving the values away from the border,
it's easier to read. We can do the same thing
for the section over here. We are at the demo
section and go and give it seven as well. The same thing for the income. The income section over
here, let's go and give it. Seven. Sometime we
can see those values, male and female, they are not
on top of the border right. Now let's have another look.
I think we can go and add spacing between those titles and the title of
the section right. What we're going to
do, let's go and select the whole container. Demo charts, and we can
add on the top adding, only the top, something
like five right. We have here a nice space. Now as you can see
in the demo charts, we still have some
spacing below right. What we can do, we can
go and e it the height. Instead of this value, we can go and increase it. To 300. So that we are
using the whole space. Now, let's go to the other
section to the income, and let's go and select the whole container
income charts, and we're going to
do the same thing, so we're going to go
and add on top five. So we have some spacing between the title of the main
section and those charts. Now if we sit back and check the whole sections and
the spaces between, then we can see that
everything is perfect. We have 20 everywhere, but only here we have
a problem right. As you can see here, tables
show it as hash line. It means there is an
issue with the spacing. So we have to go and fill it. So what we can do, just click on One of those charts and
just move it like below. So we are just pushing until
we reach the limit right. The spacing between those
sections are perfect. That's all about the spacing
between all those sections. Now we have to go
and focus about the spacing inside each of those sections and
between the charts. Of course, we're
going to go and fix all those dividers
between the charts. I would say let's start with this section, the demographics. Now my rule is side one section, we can to have ten between the charts. Let's
go and do that. We're going to start from
the left to the right, so we're going to select
the gender over here, and we're going to
have the outer padding to the right side as five. Let's go and selected like this, and then to the next one, we have our divider. Our dividers has
always on the top, we have ten outer padding and
on the bottom as well ten, and we have to go now and
make it really thinner, so we're going to go
and at it therewith, and we're going to have
only one With that, we can have a really fine
line between the charts. Now let's move to the
next chart over here. We're going to have from the left five and from
the right five. With that, we have a total
of ten between the charts. That's it, let's go
to the next one. Here we have a divider. As usual, we're going
to have ten on the top. Tin in the bottom, and
we have to make it thin. So we're going to go and
addit the width to one. Now let's go to the
last chart over here. So the whole container. From the left side, we're going to have a
five, and that's it. On the right side, we don't
have to deal with that. As you can see now, we have
really nice separation between all those charts and we have enough
spacing between them. Now finally, we
can go and adjust this middle chart since we
have now the spacing perfect. We're going to do it
like this. We can select the top charts, and we can just reduce the size of it a
little bit like this. Now what we're going
to do, we're going to go and squeeze this chart from lift and right until
it matches the values. Let's go to the outer
padding over here, the elects, and let's start
with something like 4070. We are almost there.
We have to keep pushing between those values. Maybe like this, Yeah, we are almost there, but we are shifted a little
bit to the right. Let's increase the right and
maybe the left and come on. So now we have it perfect. To know if I deselect, it looks like we have the part charts on top exactly
of those values. Now we're going to do the same
thing for the right side. I think we have to push
more from the top. Let's go over here to the outer budding and then deselect. Let's go and start with 20. So I think we are almost there. Let's go with 25,
maybe one more. T six. Perfect.
So now we have it exactly on the rows of the ages. So now the chart
looks really amazing. Okay, so we are done
with that demographics. Let's go to the income. So we're going to
do the same thing. We're going to go and select the whole container of the charts, and to the right side, we're going to have
five like this. Then we're going to go and
edit the separator from top. We're going to have ten
from pattern as well, ten, and of course, the width going to be one, let's
do it like this. Now let's go to the
right container, and we're going to have
from the left side five. That's we have a total of ten. I would say we can push on those spacing to the left
side a little bit. To the ptular right now
with that, I'm happy. Final look to the income. I would say we can
go and increase the whole height
of those charts. Select the whole container and let's push more
on the height. Let's go with the 300 again. We are done with
the income section. Now let's go to the left side. Let's start with the
first pan over here, and we're going to have L
five between the charts, but this time we have
it as a vertical. We have it four over here, but we can go and make it five in order to
stick with the rule, and let's go and
make it a little bit bigger to see the pan. Then we have our divider. This time, we're
going to have from the left and the right.
We're going to have ten. And we're going to have as
a height one like this. Now we're going to go and make everything like in the middle. So make sure to have it
something like this, and we have to go and
change this divider. We have to have on the top
ten below as well ten, and the width is going
to be as usual one. Then we have to make sure again that the containers
having the same side, something like this and
the middle perfect. Now let's go to this
title over here. Select the whole container
and add on the top five. I would say since it's a line, we're going to have
ten from left and ten from right as any other divider. We're going to have here
ten and as well ten. Then now since here we cannot
go and edit the heights. We can only edit the width,
what we're going to do. We're going to go
and squeeze it from top and bottom. How
we're going to do that? Let's go and select
those separators and we're going to go
to the outer padding. Let's have on the top 15, and on the bottom
14 and with that, we got the line effects. The same thing for
the other separator. On the top 15, On the bottom 14. With that, we have a line. Here, there is no other spacing. Let's go to the other title to the locations. We can
do the same thing. On the top, we're
going to get a five, not a ten, from left and right, we're going to have a
ten since it's supera now we're going to do the same
things for the separators. On the top 15, bottom 14, the same
thing over here. So 15 and 14. Nice. Okay, great. So now let's have a look
to the whole dashboard. Let's go to the
presentation models. And now sit back and
check whether you can find any problem
with the spacing, from my point of view, we
have a perfect dashboard. So we are done with the spacings
between the containers, charts sections and everything. It looks really
professional right. Okay, now the next step, we're going to go and add
tooltips to all our charts, and I think you would
agree with me if I say, adding tooltips is a
little bit boring. But it's provide really nice
informations for the users. Let's go and do it. We're
going to start with our bands, so we're going to start
with the active employees. Let's go to the
charts. Now let's go over here to the tooltip, and we're going to
do the following. We're going to say
the total number of active employees and then we're going to go and
insert our measure. Now, it's very important
that we always follow the same standards when
we are using the tooltip. I would say that always the normal text
should be not bold. Only the words that you want to highlight could be go
bold, for example, here. What is important is
the active employees. Of course, the measure
itself, it's already bold. Now, about the colorings, we're going to use two
different gray colors. If we go to the normal
text over here, let's go to the coloring, we're going to go and
choose this gray over here. Let's go and select it.
Then for the highlights, we're going to go and
use our dark gray. Like this and the
same for the measure. For now we are done.
Let's go and copy it because we're
going to go and use it in the next chart. Click and then let's go back to our dashboard and just
mouse hover on it. You can see very nicely the total number of
active employees, and we have then the number. Now let's go to the next
pan to the hired employees. Let's go to the toll tube and replace the whole
thing with this one. Instead of active, we're
going to have the hid. Let's go and give it
the color that we use usually for the
hid the green one. Of course, we don't
use the total active, we're going to go and
insert the total hid. And of course, remove
the active one. That's all, let's go and
copy it for the next one, and of course, we
have to go and test. So D's co. As you can see, the total number of
hired employees, and we have the number,
let's go to the next one. Here we have the terminated. So we're going to use
terminated and for that, we need to use the pink color. And here, of course, we
don't have the hired, we're going to have the
terminated Like this, it's it okay and check the
result as a dashboard. Everything is perfect. Now let's go to the line charts, and we're going to
go to the tool tip, but make sure that you are not selecting the tool tip
of any of those marks. Make sure to select the all. That we have the same tool
tip for both of the charts. Stay at all and go to Toll tip. Now let's go and add
it as a new line. We go and remove this one,
but we need the year. Of course, now we
have a chart and depend where is our mouse. We can have the year displayed. Let's go and make it
bigger like maybe 11, and as well, let's
make it green. Okay Let's go and hit.
Let's go and test it. As you can see, we
have here 2017, 2020. You know what? I would
like to go and add the percentage side by
side to the number. Let's go and get the total hired and drop it
on the tool tip, and then let's go to the
tool tip and have a pipe. Then we're going to go and
insert the percentage. Let's go and test it. Now, as you can see,
we are getting both of the percentage and as
well the absolute number. But I would like to go and
get rid of the decimals. Let's do it from
the data source. Right click on the field. Let's go to the
default properties and then to the number format and then remove
from the percentage the two decimals,
and then it's okay. With that as you can
see, we don't have any decimals with
the percentage. Perfect. Now let's go and copy the whole thing
for the next charts. Of course, we're going to go and test it on the dashboard. As you can see, it
looks really nice. Let's go to the next one. And same, make sure to
select the all and then go to the tooltip and
insert the whole thing. Now instead of higher dates, we need the year of
termination dates. Like this, I remove the old one. Now we're going to
have that terminated. Of course, we go and change the color to the pink like this. Here we have the wrong major, so let's get the total
terminated like this, but make sure to select
the same color right, so it is our dark color, and we have to create a new percentage
for the terminated. Click for now and we
can go and test it. As you can see, the total hid is not working. Let's
go and fix it. We're going to go over here to the total id with the
percentage and duplicate it, and we're going to go and
edit it to total terminated. Here instead of hyod, is going to be total terminated, divided by total total
terminated. Like this. Let's go and it and let's go and grab the total terminated
to the tooltip, and let's go and edit it. We have to go and insert
it and remove the hid. Like this. Now we have a nice percentage as
well in our tooltip. Let's go and test it as
well in the dashboard. It looks nice. Now let's
go to the departments. This is going to be interesting. Let's go to the sheets. Now what you're going
to do we're going to go to the tool tube and
insert our template. Now what is the main
dimension over here? It is the department. Let's go and insert it and
remove the higher date. Now here it depends
where our mouse is, we're going to get either the hired or the
terminated employees. We cannot have it like
this as a static. We're going to go and insert
the status over here. Now it's going to be dynamic. Let's go and make
it bold and make sure that we having
the right color, so it's going to
be the dark gray, and I think we can leave it
like this. Let's go and test. So Let's go to
operation over here. As you can see,
we have operation the total number of
hired employees, but the percentage
is not working. Now let's go to the
terminated employees, and as you can see it is dynamic and change to
terminated employees. So far it is working, but we have to go and
fix the percentage. That's because we don't
have it in the charts, so drop it on the tooltip.
Let's go and check. It's still not working. I think we have to go and
insert it again. Let's go and insert it and
remove the old one. All right. So let's go and hit and test. Now it is working. All right. Now here are the best
practices as well. If your dimension in your
chart having hierarchy. As you can see here, we have
departments and job title. We can go and add
the dimension that is next in the hierarchy
as a tool tip. We can go and build
a special chart for the job title and include
it in the tooltip. This is really
amazing technique in order to quickly drill down to the next dimension without
changing the whole dashboard. Let's go and do that. It's very simple, what
we're going to do. We're going to go and duplicate the departments.
Let's go and do that. Now let's go and give it
the name of the job titles. Now what we're going to do,
we're going to go and replace the departments with the job
title. Let's go and do that. Now I would say we're going to go and reduce a little bit, so we don't need the status at all as a color, so
let's go remove it. But we still have to
go and sort the data, which is now not correct. Let's go and sort Then we're going to go with the
field, descending, and, of course, go and select
the correct field, which is the total highd, since we are using
it in the charts. Let's say okay. Now
about the coloring, I would like to go
and highlight only the maybe let's say two jobs. In order to do that, let's go and create a new
calculated field. Let's call it top two, and the function is very simple, so we're going to have
the rank function. Then we are ranking, we are
ranking the total highed. So the total hirod. If this is smaller or equal to two, then
it's going to be true. Otherwise it's
going to be false. Let's go and call
it rank top two. Now with that we have
a new dimension. Let's go and grab
it to the colors. Now as you can see, we are
now highlighting the top two, and of course, we have to go and change the coloring for that. If it is false, it's
going to be the gray, and if it's true, it's
going to be the green. That's it. Let's
it and of course, go and remove the legend. I would like to see the
labels at the end of the par. Instead of center, let's
have it to the right side, and let's go and change the
color to the gray color. We're going to have our gray
color. Like this. All right. Now the next s of that,
we're going to go and add the whole chart inside the
tool tip of the departments. Let's go back to our departments and tooltip. Now what
you're going to do. Let's have a new line. Let's call it total
by job titles. Now we have to make sure
that the coloring is okay, so we're going to use this
gray and the chop titles, it's going to be our dark gray and only the job title
is pulled like this. Now the next epi
that, we're going to go and add our charts. So let's go and do that. Go to insert, to sheets, and then we're
going to go and add the job titles from here. So let's a ok and
check the results.
206. HR Project | Build the Table: Now let's check
the second section of the user story
and the requirement. So here we have the
employee records view. It says that we have
to provide a list of all employees with necessary
information such as name, department, position, gender,
age, education, and salary. Another point in
the requirements about the interactivities, that the users should
be able to filter the list based on
the available cons. Here we don't have to
build any visualizations or charts or anything. We have to provide
only a list of all employees with
important formations, and on top of it,
we need filters. It sounds very
simple. Let's check how we can build
lists in Tableau. Let's start immediately
building the charts. Here we have two methods. Either we're going to go
and build a symbol list, where we have a symbol
table in Tableau, where we're going to go
and add, for example, let's say the employee
ID, go add locations. Like as we see, we are adding just dimensions
side by side. So of course, we can say this is the detailed list
of the employees, and the job is done. So I cannot go and
put in each cell like two informations
underneath each others, or I cannot go and
add icons and so on. So it is nice, quick way, but it is very limited. And now the other
method is that, we're going to go and use some tricks in order
to customize the list. It is time consuming, but the end result is
really nice in tableau. So since it's advanced projects, I'm going to go with
advanced techniques. So now, what are we
going to do? We're gonna leave the employee ID. As a starter, and
make sure we are selecting standard
and not entire view. Otherwise, we going
to have all the employees in one view. This will not work.
So make it standard. Let's go and remove the header. And of course, I'm
going to go and change the design of our worksheet. So let's go somewhere
here and say format, and we're going to
go to the shading and let's make it plack. Of course, we're
going to change that later once we have
everything in the dashboard. So what do we see here first? We have the Ds of the employees. Let's go and hide
the header as well. And we're going to have the
coloring of this dimension. Going to be our light gray.
So let's change that. Now, this is the only dimension that we're going
to use as a row, and the rest, everything
going to be a columns, and we're going to do
the following trick. So we're going to go over
here and say average and -1.0 like this. Now as we learned, this
format is going to add a placeholder for a
shape for a visual. Now for the chart
type, we're going to go with the shapes. So now we have here
as the shapes. Now here we have like
circles everywhere. This is our placeholder. I'm going to go and change as well the format of our grid. So what do we need
with the lines? I make sure everything is none, just to make sure that
we don't have anything. Then we're going to go to the
columns, remove the grid, and we're going to go and
add a fine line as raw, but I'm going to go and make it really dark. Now it looks nice. Let's go and hide as well,
the header informations. So the first column
going to hold all the informations
about that demographics. What we need, we need the
first name and the last name, since it is the most basics
about each employee. Now we have the first name
and the last name separated. What I'm going to do,
I'm going to go and create a new calculated field. I'm going to call it full name. But now I'm going to go and merge both of them like concat, both of those informations. We have the first name, and then we're going to have
the plus and then space between the first name
and the last name, and we're going to
get the last name inside our calculation. Wh that we have the full name. We have it as a new field.
Let's go and drop it. On the labels over here. So as you can see, we have the full names of the employees. Now, for the shape, let's
go and add the gender. So we're going to go and have
the gender shape over here. We cannot see it yet
because of the colors, so let's add it as
well to the coloring. So now we have the
same shapes that we have used in the
income analysis. Now, what else we want
to add is, for example, the age, let's go and drop
the age as well to the label. And the last information
about the demography, we're going to have
the education level. So let's drop it as
well to the labels. Now as you can see,
we have a lot of information that
is naturally nice, and there's a lot
of overlapping. So we have to go and format it. Let's go first to the labels. And we're going to go
inside it in order to customize those informations. Everything going to be to
the left side as alignment, and then we're going to
have the HL education side by side and split it by a pipe. About the style, the first draw, it's going to be bold and
using the light dark or gray, and the second draw
it will not be bold, but we're going to go
and use our dark gray. This is going to be our
style for all columns. Let's go and hit okay. Now as
you can see it looks nice. We have the full
name and below it, we have a few more informations
about the employee. But still, as you can see
the alignment between the informations and
the ID is not correct. What you're going to
do is going to go to one of those rows and just slightly increase the
size until it fits the screen. I'm going to go and
make it as well. I'm going to go with
one more increase. With that, as you
can see, one row holds all the informations, there's no overlapping,
and you keep doing that until you don't have any overlapping between
the employees. As you can see, it
looks already very nice compared to having a list. Now on the right side,
we have those legends. Let's go ahead remove
them. We don't need it. Now we're going to go to
the second column as well, it's going to be a
bunch of informations. What we're going to do,
we just to copy it. Hold control and just
drub it side by side. Now as you can see, we
have like two columns now. I'm going to go and as
well format the grid, where we're going
to go to the grid over here to the columns. And we're going to remove
the column divider. As well, I'm going
to go and remove the rows. Let's go to the rows. I remove it. It
looks more clean. What we're going to do
with the second column? Let's go and add
the whole dimension of the department
and the job titles. Make sure to select
the correct one. The first one is for
the demographics and the second one going to be for the departments and jobs. Let's go and remove
everything. From it. Now we're going to go and
drop those a formations? Let's get the job title
first to the label. It's more important
than department. Then the second one going
to be the department, as usual, we're going
to go and design it. Everything to the lift, the first row going to
be bold and light gray. The second row going to be a dark gray and not bold.
That's it. Let's it. As you can see, it
looks really nice. Now the question is, do we have an icon for the
departments and jobs? Well, I don't have any one, so that's why I'm going
to go and hide it. If you have one,
you can go and dit. What I'm going to do,
we're going to go to the size and reduce
it completely. But we still have a fine dot. We have to hide it
by the opacity. Now if I remove it like this, you will not find it anymore. This is the trick, and
it looks really nice. Now, let's go and
add another column. It's going to be about this
time, the dimension location. Same things. Let's
go and switch to it. I'm going to go and
add the location as a color this time and then
the city in the lapel. We're going to get both
of them as a lapel. Now let's go immediately
and start formatting. Both goes to the left side. I wish to have first the
city, then the states. As usual, the first one
going to be the lights. Bold and the second one
going to be the dark one. All right. Now
let's have a look. Everything looks nice. I'm going to go and change
the design of the shapes. It's going to be filled circle and it's a little bit beak, so I'm going to go and
reduce the size of this one. If it is HQ, it's
going to be green, if it's gray, it's
going to be branch. You can see it's not that
complicated right, it's easy. Let's add another information. I think now we can go
and add the celery, but sadly we cannot go and add anything else to the salery. So we have to go
and use it alone. Let's go and add the
salary to the labels. Here we have those numbers. I would like to format it, Let's go and format the numbers. Let's go to numbers, and then we're going to go
to the number custom, reduce the decimals,
and as a prefix, let's add the dollar sign.
The number looks nice. Let's go to the
label and design it. Here we have the
informations from the previous one.
We don't need it. We have only the celery, and since it's the first row, we're going to make
it light gray. Since it's in the first row, it's going to be the light
gray, and as well bold. Let's it okay. For now, I don't have any
shapes for that. That's why we're going
to go and reduce the size and make
the opacity to zero. Now to the next column,
what we're going to have, we going to have the
status of the employee, the higher date and
the termination date. The status of the employee, we're going to make
it as a color. That's we have the
gray and the green, and we're going to make the
circle as a filled circle, reduce the size.
Something like this. Now I would like to add
it as well to the label. Now what we need, we need the higher date
as well to the label, and as well the terminate date. But here we have it as a year, I would like to have
the exact date. We're going to go
and switch it to exact date and then to discrete, the same thing for the
terminate date to exact date, and then to discrete. Now we have all informations. Let's go inside and
start configuring it. Now we have here the status
higher date and term date. Let's go everything
to the left side, and we're going to put
the terminate date and then minus between them, then that term dates, we're going to go and
design it as usual. So the billow one going
to be the dark one. Okay. Let's get ok and check. Now we can see in the output, we have the higher date, and let's see a
terminated employee. As you can see we have
here a terminated date side by side. All right. Now the last column is
going to be interesting. We're going to have a bar chart indicating the
length of the hire. We're going to go
and calculate in years the duration
of the employment. Let's go and create a
new calculated field. We're going to call it
the length of higher. Here we have two calculations. If the employee is hired
and not terminated, we're going to go and calculate the years between today
and the higher date. Let's go and do that. We're
going to need an F statement, and then we're going to check whether the employee is hired or not using the
following logic as usual. Is null. So we are checking
the terminate dates. If it is null, then the
employee is not yet terminated. So what can happen?
We're going to calculate the differences between
today and the higher date. Date dif, and we're
going to have a year. I'm going to go and
add it as a new row. What we are calculating between the higher date and today. This is the formula for the employees that
are not terminated, and now we're going
to have otherwise se. We're going to have
the date diff, and now not between today
and the higher date, it's going to be
between the higher date and the terminated date. Going to be the same thing year, higher date, and terminated
dates. It's very simple. Let's go and end it. Let's.
So now we have a new major, and I would like to
go and test it first. Remember the first sheets
where we test stuff here. I'm going to remove a few stuff. We need the higher dates, the terminate dates, and
our new nice column. I'm going to show
it as discrete. Now, of course, depend on the year that you
are doing the tio, you might get different results. Now as you can see here,
we have six years, two years, two years, and so on. Since here we have
a termination date, we have here a zero. Everything is working, let's go back to our detailed list. Now we need a new column, but this time we will not use the placeholder because we
have already a measure. We have already the
lingth of higher, let's rag and drow
it side by side. Now we have to go and
configure the chart type. It will not be a shape. Let's go and use the par. Now we have a par in our charts. I'm going to go and
reduce the size of it. Maybe more. Now let's go and
add content to those pars. Let's start with the status. I'm going to put
it on the colors, and we need as well the label, we're going to take
as well the length of higher to the label. Now let's go and edit
it, so let's coincide. We don't need all
those informations. We have here the
number of years, so let's go and make
it bold and as well change the color
type to light gray. After that, we're
going to have years like this and maybe not as bold. That's it. Let's go and hit ok. Now we have light years
at the end of the bars. But what we can do, we can
go and change the alignment completely left and in
the center. All right. Now let's go and
check the results. As you can see in the list, we have the two colors. Here, for example,
we have one year of termination as well here. The legend is working. Now, as you can see, things
might be very tight. What I'm going to do,
I'm going to go and change the size of
all those sticks. Let's go to all and
then let's go to label, and then to the font, and let's make it
eight instead of nine. That we're going to have bitter spacing between
those columns. Now the next sib of that,
I'm going to go and remove all those
informations here the axis. Let's go and remove
Shohader, and we are done. Now we have a really nice
list for the employees. Again, this is the one that is time consuming,
but as you can see, we have nice bars, we
have a lot of icons, and we have multiple
informations in one column. It is a little bit confusing at the start on how to build it. But once you understand it, you can go and make
amazing lists. And of course, having a
simple list as well is fine.
207. HR Project | Sketch Mockup of Detailed Dashboard: So now we can plan the mockup
for the second dashboard, and this one can be really easy. And we have the same title, but at the end, we're going
to swab it with the details. Now in the middle,
we're going to have only one section called
the employee list, and here we have only
one type of charts. We have a list, so we're
going to have multiple rows and multiple columns and
informations in each cell. Now, of course, if you
have a detail list, it would be nice if we
can filter the list. That's why we're going
to put on top of each column an option for the users in order to filter the informations that we
can see inside the cells. At the end, as you can
see, it's very simple. We have only one list and on
top of it, we have filters. That's it for the dashboard
Map. As you can see. It's really easy. Let's move to the second mocap
where were going to plan the containers
back to Toyo. Now I have a screenshot
of our new mockup, and I cap it a lot of stuff
from the previous design. Now let's dive in and
see how we can do it. We're going to focus on the
black box in the middle. What we have here,
we have a title, then filters and a list. We need a vertical
container for that. Let's go and do it. This is the main vertical
container like this. Now what do we need?
We need a title. First, it's start
with one title. It's going to be as
well to the left side. I'm going to make it like this. Now what do we have
below it? We have now different filters
side by side. We need horizontal containers. Below it, we're going to have a horizontal
container like this, and let's remove
it and inside it, we're going to have
multiple filters. It's going to be filters. Well, they all going
to be side by side. Of course, they are
way more details as what I'm showing you now. And we can talk
about it later here, we are talking about the rough design about the containers. Now what do we have
below the filters? We have our chart, the list. It's going to be only one
object without any container, so below it, we will have
a pi list like this. That's it. Now let's go and focus what we can have
inside the filter. Now, I just took a copy of a filter and let's design
the container for this. As you can see, it's like
something below each others, so we need a vertical container for the whole filter like this. Now inside it, we're
going to have a title and side by side with an icon. For that, we're
going to go and get a horizontal container. Inside it is going to be like a horizontal
container like this. We're going to have a
title for the filter. And side by side with a
very small green icon. Now to the next one,
what do we have? We have like filters
underneath each others, and that's why we're
going to go with a vertical container
for the filters. It's going to be like
this. And inside it, we're going to have
multiple small filters. Filter one and
another one below it. This is the design of each of those filters that we have on top of the list. All right guys. W us we have a rough plan for the container structure and as well for the
dashboard itself. Now let's go back to Tableau in order to build our dashboard.
208. HR Project | Build The Detailed Dashboard : Now, we're going to
go and create the dashboard for the detail list. But this time we will not
do it from the scratch. We're going to go and duplicate the whole
work that we have done and only do a few adjustments
for the new dashboard. It's going to be time consuming only for the first dashboard, but once you have it,
then you can go and duplicate it for the rest.
Let's go and do that. We're going to go and
duplicate this dashboard, and we're going to go and
rename it to H R details. So now the first step of that, we're going to go and prepare
the containers as usual. Let's go and make this bigger, and let's go to the layout. Now of course, we are not going to change the navy container. We're going to go work with
the container in the middle. Let's go to the whole dashboard
over here and drill down, so it's going to be the Nav. And here we have the
header and charts. It's fine. Let's go inside it. Now we have here the header,
it's going to stay as it is, but this container going
to be dropped completely, right click on it and remove. Well, yes. What is left
over here is this legend. I'm just going to take it
and put it here on top. Maybe later we're
going to use it. Now let's focus on creating
the content in the middle. What do we need? We need
first a vertical container. Let's strike and drop it
exactly below the title. Then as usual, we're going
to go and drop some planks. This is the first plank
and then the second plank. We can go of course and
mark it if we want. The whole thing going to be with the border, the orange one. Now we can go and as well
rename it, filters and list. Now, for the filter, we need
one horizontal container. Let's go and drop
it here on top. Of course, we're
going to go and add some blanks inside it. This is the first plank.
We have it somewhere here. Then the right plank in
order to have it as fixed. Select the whole
thing, and we're going to mark it with a plu container. Now what is below the filters, it's going to be our list. Let's go to the dashboards, and we're going to go
and grab the details. Let's drop it
beneath the filters. Let's go back to the
layout and check it. As you can see, we have the
filters and the details, how we can go and
remove the planks. We don't need it anymore. So
by looking to the charts, we can go and remove the title. This is the main containers
for the dashboards. Now what we're going to
do, we're going to go inside the filters container, and we're going to
build one container for each group of columns in order to have
the filters for it. Now for the first two
groups of the columns, I'm going to do it
step by step slowly, but for the rest, I'm going
to speed up the video. Now let's start with
the first container for the employee ID.
What do we need? We need a container, of course. It's going to be vertical
container, and then inside it, we have two plocks, And make sure to have
it below it exactly. This is our container. Let's make it a
little bit bigger, and we can go of course and market in order to
see the borders, going to be this one and orange, and we're going to go
and rename it like this. Employee, ID. Filter. Now, what do we need inside this is two
horizontal containers. The first one going to be
for the title of the filter. We're going to have
immediately a text inside it. Let's call it employee ID. Let's take it to the middle, change the color to light gray and maybe make it as a
ten for now, so it okay. Now the next we need
a second container, but this one is going to be a vertical one exactly below it. Let's go as well and add
a few planks inside it just to make sure
that we have it as a vertical container. Let's go and rename stuff. This is going to be the title. And below it. We're going
to have it as the filters. Of course, we can go
and add the borders in order to see everything. Let's go remove
those place solders. So remove the plank
and as well the plank. Now the next sib of
that we're going to go and add a button for the second container to be used or to be added on
the first container. Let me show you tan. Make
sure to select the filters, right click on it and
add show Hide button. Now we have here a
small button over here. We have to go and remove
the floating from it, so it lands somewhere here. Now, drag it and put it side
by side with the title. Let's go and make the whole
thing a little bit smaller. Now in order to understand
what I mean with this button, we're going to go
and add a filter inside the second container. What we're going to do
we're going to go to our list and to the small arrow, and then let's go to filters, and let's grab employee ID. Now as you can see
our filter now inside the container filters. It's very important
to make sure that everything is correct in
the correct container. Let's go and test out. Now why do we have this patom? Check this out.
If I click on it, we don't see any filters, so we are hiding the filters, and if we click on it again,
we can see the filters. That's why we have
to have this icon outside of the container in order to control the
visibility of this container. This ptom is controlling whether we are showing the
filters or not. Now, let's make the design
a little bit better, so let's go inside it, and this time we're going to go to the pattom, so
let's go and edit it. So if it is shown, I have an image for that. It's going to be this
arrow, the green arrow, so let's go and select it, and if it is hidden, then we have the
gray one like this. So let's go and hit. Now we have to make sure
that the whole container of the title is fixed. As you can see it's fixed
height, which is correct. Now let's go and test it. As you can see now,
the arrow is inactive, but once I click on it, it's going to be inactive and
it has really nice effect. Now we need to fix something. If you see here, I'm
hiding the filter, but there's a lot
of wasted space. What you're going to
do is going to make things more dynamic
and flexible. If I'm not showing any filters, this space should be
used for the list. So currently, we are
wasting a lot of space. Let's see, we can fix that. So let's go back
to our dashboards. Now the first step of
that we have to make sure that our list is flexible. Let's go to this small
arrow over here, and we have to make sure there
is nothing selected here, so fixed height is not
selected, which is correct. Now the next step,
we're going to go to the container
filter over here, select the whole thing and make sure this as well
without a fixed height. Go over here. You can
see it is fixed height, so let's go and remove it. Now as you can see, Tableau
did use the whole space, so now it's more
variable and dynamic. Now one more thing that I
would like to do is to go to the filters and remove
all those planks, remove this one and
this one as well. Let's go and test again. Now we are using the whole space because we are not
showing any filters, but once I click on the
button, what can happen? I'm going to use the space
in order to show the filter. This is very dynamic
and looks really nice. That's all for the first filter. Let's go and make
everything smaller. And I'm going to go
and do the same stuff for the second filter. So here we have a
bunch of informations, we have a round like
four informations, so we need four
filters for dots. Now we're going to go
and do the same stuff. So we need a vertical
container side by side. Let's go and add a
few planks inside it. It is this very small one. I'm going to go and select
it and maybe as well, change the color of thats. So like this, it's still
small, so make it bigger. All right. So the
first container in side is going to be the
horizontal container. I'm going to go and add
for that, the text. This one is going to
be the demographics, going to be the middle
and light gray, as well, let's make
it ten for now. Ho. Then the next tap, we're
going to go and add another container
and this time it's going to be the vertical
container below it, and here we're going to
have a lot of filters. Let's go again to our list. The first thing we
need that full name. It's dropped over here, let's go and drop it where we want, and we're going to change
it to a drop down list. Now the next spa we need to
go and get the gender filter. Let's go and get it. Now we have it over
here, so drag and drop it exactly
below the full name. I'm going to go and
remove this plank. Otherwise, it's going
to go and confuse us, so remove it from dashboard, and as well the second
one. Now it's fine. Let's go and edit the gender. It's going to be
a drop down list. Now the next one
we need the age. I'm going to say,
let's go and get the age group. Let's
go to filters. We don't have it yet because we don't
have it in the list. We have to go inside
the worksheet. Let's go to all and drop the age group somewhere
in the details here. Then we should be
able to find it. Let's check again to filters. I now we have the age group. Of course, we can have
it on the first filter. Let's go and drop it
exactly below the others. Make sure always that
you are dropping everything inside this
vertical container. It's going to rename
them as well. It's going to be the filters, and the above one, it is the title, and the main one, is the demo graphic filters. Let's go back to our filter, make it a drop down list, and we need the last one. It's going to be the
education level. We're going to have
it as well here, drop it exactly below the others and a drop
down list. Great. Now the next step that
we're going to go to the filters and add
a button for that. Let's go and do
it, add a button. We have it over here, change
it from floating to tilt. We have it over here. Let's drop it side by side to the title. It's not working, so we'll
drop it somewhere here, maybe first and then
take it near the title. Great. Now, let's select
the whole container, make it smooer, and we're going to go and
work with the icon. Let's use the green as shown. And the hidden
should be the gray. And we can go of
course and test it. So now close it, and show it. We have to go and
fix the height in order to not have
this strange effect. So fix the height, and now we will not have it. Hide it and show it. All right. Now what we're going to do,
we're going to go and fix the design of those two filters, and we're going to
follow the same design for all other filters. Let's see how we can
do that. First of all, I'm going to go and
give a background color for the whole section. Let's go and check
the whole section, it is filter and list. So let's go to the
background over here and pick the place one. Now, the next step,
I'm going to go and remove the background
color of the worksheet. Let's go to the
format and then to the shading and remove
the worksheet color. Now let's go step by step
for those two filters. First, I'm going to go and
switch the title and the icon. I would like to have
the icon to the left, the same thing of our here. Now the next step, those
icons are really big. Let's go and give
it a fixed width, and then let's have
a value like 25, the same thing of our here, so fix and 25, the next sib, I'm going to go
and work with those titles. Let's move it to the lift and make it smaller to the nine. The same thing here instead of employee ID, let's have only ID. We don't have a lot of space, make it nine and
to the left side. Now the next sibth
that, we're going to go and work with the coloring. Let's put one of those filters then to format filter
and set control. Now for the title, we're going to make
it smaller to eight, and with the color, it's
going to be the dark color. Now for the body, it's
going to be as well eight. At this time, the color
going to be the light gray. It seems the title the change
again, that's strange, let's go and change it back
to the dark gray and taste. So the color of the
values are okay and the titles are
darker. Nice, great. Now the next time we're
going to go and place the filter exactly on the
top of the column itself. Let's go and do that,
select the whole container, and let's press it to be
exactly on top of the IDs, something like this, and
the same thing here. L et's move it and
maybe around here. But we still have a
divider between them. It's going to check the layout. So we're going to have
it always like this, a filter and then a
divider between it. Let's call it divider. How we're going to
start the divider? It's going to be as
usual, a dark gray. Now let's go to
the outer budding, make everything as zero. Change the width to one. So we have it very thin,
and then we're going to go and add an outer padding
to the left and right. Let's have something
around like 36 to the lift and
six to the right. We have a small
separation between them. Of course, the last step,
we're going to go and remove all those borders. We are done with that. We have here as well a border and the same thing for
the next filter. We have here a border.
Now we can see we have still space between the
filters and the list, so we can go and select
the whole thing. Just to make sure that
we are selecting it. Let's just shift it to the
education level. All right. Now by checking that
divider doesn't look good. So let's go back to divider
and have as well on the top ten and below
that as well ten. So let's check again the design. All right, so we are done
with the first two filters. We have to go and
repeat the same stuff for all other columns. So what can happen, I'm
going to go and speed up the video as I'm creating
all those filters. Oh Oh. Oh. H. Oh Was a lot of filters
inside our dashboard. Now let's go and test it, so we have all those filters. We can go and hide all
those filters as well, but we still have an issue. It is not any more flexible. I think we have still
a fixed height. Let's go and fix that. Let's go and select
the whole container. It was the filter containers and it should not be fixed yeah. Here is the issue,
let's go and remove it, and let's go and test again. We open the first filter,
the second third. And we are almost there. We still have here a
lot of wasted space, so let's go and check
the containers. And it should not be fixed, so we have it as fixed,
so let's remove it. The first one, it's not
fixed, so it's fine. Second one, remove a fixed, and here as well,
it's not fixed, fine. So and the last one. Great. Let's go and
do the final tests. If we close everything, the list should be bigger. Now let's go and add spacing
inside our dashboard. Let's go and do that,
and we're going to go and remove all those borders. Let's go and select the whole
container filters and list. And we're going to go
and remove the border. Now as you can see
at the bottom, we don't have any spacing, so we have to go and add an outer adding.
Let's remove the two. We need only 20 at the bottom. Great, now we have space. On the right side, it looks good as well on the top,
now it looks good. Now let's go and add an inner
spacing and it's going to be the number seven
for all sides treat. Let's go and remove the
blue container here. We don't need the order. Let's go and expand
everything again to see whether we
have any borders. We don't have any
border colors, great. Let's go and close it.
Now we'd like to go and add a title for this list. Let's go and grab a text and carefully put it on top
of the current container. We're going to say employee
list and then a Pie, and then we're going
to tell the users to click on the arrows, so click arrows for
filter options. No know we have to go
and change the coloring. This is going to
be a light gray, a bold, and it should
be 14 for the size. For the rest, it's going
to be a dark gray. Let's go with an eight. All right. Looks fine. Now, let's go and add a spacing between those three sections. We have a title, we have
the filters and the list. Let's start with the employee. I'm going to go and add a
badding at the button around like maybe ten. Looks nice. Now let's go for the
group of filters, select the whole container, and let's go with the padding
to the bottom around ten. With that, we have
like spacing between all those objects and
it looks way better. Now the next time we're going
to talk about the legions, I'm not going to use any
legions in this charts, and let's go remove it as well, we didn't need any filters
since we have enough filters, let's remove it as well. And as well this icon. With that, we're done with the main part of our dashboard. Now we're going to go and check our navigation and the title. Of course, we have
forgot about the title. Instead of overview,
it is details. Let's go and change
the size of this word to 16 and maybe
something darker. I'm going to go and change
it to something like this. Yeah It looks way
nicer than before. I'm going to go and take
the number of the color, and we have, of course, to change that for the
first dashboard. Let's go over here, make it 16, and as well, change the
color with the same color. It's a little bit darker
and it looks way nicer. Now on the left side,
we have an easy job. What we're going to do,
we're going to go to the first icon and
make it deactivated. Let's go and edit the button, and now instead of active, we have to have it as a
deactive or inactive. Now as you can see
it is inactive, and for the first button, we're going to go
and make it active. This is going to be
the green table. Of course, now we
can go and map it. We have this dashboard. Let's go and map it to the details. All right. It looks really nice. Let's go back to the
first dashboard, and of course, we have
to do the same mapping. Let's go and edit the button, and we're going to mab it to
our new dashboard details. Now I would like to go and
add one more nice thing in order to indicate that
this icon is active. I'm going to go to the
dashboard to the floating, and let's grab a plank. L click on the plank
and let's go and pick the background color
of the green color. Now we're going to go
and decrease the size of this to be a small
indicator like this, maybe. And we're going to
move it over here. I'm going to say let's
make it like the height 40 and place it
exactly near the icon. Maybe something like this. Now let's go and
chick the dashboard. I'm going to go and
reduce the width of that, so let's make it thinner,
maybe like this. With that, we have
like a small indicator that this icon is active. Let's go and do the same thing
for the second dashboard. We're going to grab as well. Again, a plank and we're going to make the
color of that green. The width is going to be six and the height going to be 40, and now we're going
to go and place it exactly near the active icon. Something like this. All right. Let's go and check the design. It looks really nice. Let's have a final
look to our dashboard. Here we have a nice filter
and the main dashboard. Here we have this
nice information. We can go and download stuff, we can go and follow, and the whole dashboard
is interactive. Now if the users wants to go and click on the
second dashboard, all what they have to do is
to go and click on this icon. And we are now on the detail
list about the employees, and everything here
is very interactive. Let's go and hide all
those informations, and it looks wonderful.
209. HR Project | Bonus - Build Background Layers using FIGMA : O. All right, friends, now we have a bonus section, where we're going
to go and customize a background image for the
layout of our new dashboard, and that's going to make
the overall design of our dashboard look really
cool and profesional. At this time, we're going to use another tool in order
to create the layouts. We're going to go and use
Figma. What is Figma? Figma is a design tool
that is used by many UI and UX designers in order
to create concepts, mops for the user interfaces. And it is amazing tool in
order to share your work with the others in order to work
and collaborate at the team. You can find the
link to my work with the other links in the
project materials. Of course, don't
worry about the cost. There is a free plan for stars. Now we will not do a deep
dive into how to use Figma. I will just show
you how I usually use it for Tableau. Let's go. Now we're going to
start with empty file, and we're going to put a
screenshot from our dashboard. Now the next step with
that we need a frame. So let's go and get a frame exactly on top of our dashboard. Now we can go and
hide the image. Now we need a color
for our dashboard, so it's going to be
something maybe like this. Or let's increase
it a little bit. Now what we're going to
do, we're going to go and add lightning from the corners. In order to do that,
we're going to take the shape of circle or ellipse and going to make it like this and maybe a little bit
bigger and to the pack. Let's go and change the color of this and something here
like in the middle. Then we're going to
go and add an effect in order to have like a glue. We're going to have a blue, and we're going to go
and change the value to something around 1,500. Some of you check,
we have a glue or like light that
comes from this corner. Now let's go and add the same in the other corner,
can do it like here. Now let's go and
increase the size of this one. Something like this. We need more lightning
comes from the right side, and still we have
to have it like bigger and one more darker. All right. With that,
we have a background. Next, we're going to go and add the background colors
of each section. We need again our image, and now we have to
go and zoom in. Now, what we need,
we need a rectangle, and we have to be very
careful that we meet the exact edges of
our dashboards. So let's get it like this. I'm going to go and reduce
the opacity to something around 50 just to
see the borders. So Yeah. Nice. Now we're going to
go and increase it to 100, and we need now the
color of complete black. Now what we're going to do,
we're going to go and use the gradient instead of the solid. So let's
go to do this. Now we're going to go and
work with the lower value. We have to decrease
it like this, maybe a little bit
more, like this. Now the next step, we're
going to go and add a corner for our container,
maybe 20, great. Now let's go and
repeat the same things for the other containers. We're going to have
it for the overview. Maybe reduce again the
opacity to see the borders. So like this and here as well. It's going to meet
the same borders. So now let's go and copy
this to the second section. So increase it like this, and we have to meet
the itches perfect. Let's go and do the same for the last section.
Something like this. Now we are done. We
have to go and increase the two, 100 everywhere. Of course, we're going to go
and remove the background. We are almost there. What we're going to do
were going to go and change the coloring of
each of those containers. Let's go to the linear and maybe we're going to go and take the lower level like
outside and this here. It's going to go a
little bit darker, to the next one as
well to the linear. We're going to have
it somewhere here, and the low value
going to be outside. Now what I'm going
to do, I'm going to take those eclipse and put it somewhere like here and let's keep working
on those coloring. Let's move to the next
one to the linear. Et's move this somewhere
here and check the colors. We can put it like this
and to the last one. It like this here. I'm going to have it here like rotated. Great. Now let's have a look. It looks very nice. Now I'm going to go and add
our second dashboard over here and make sure to place it exactly on top
of our dashboard. Let's move it here
and let's close some of those informations.
I'm going to have only the. Now we need one
more for the list. Let's go into this. Le bit. Decrease the opacity
to see through. Decrease the opacity
to see through 40. Let's go and meet the Borders. Yes. Okay. That's it. We're going to go
and increase again, the opacity to 100. Now for the filling, we're going to do
something like this. And the low value going to
be a little bit outside. That's it. Now we
have to go and export those background images. We're
going to do it like this. For the first dashboard,
what do we need? We need the Navy and
we need those two, and we have to go and hide
all the images. That's it. Click on the container, and we have here the
option of exporting. Let's go and export it.
Now we have to go and export again for the
second dashboard. So we're going to go and
hide those informations. We need this and that sets, let's go and export again. All right back to Tableau. We're going to first remove all the background colors of each containers before
adding the background image. Let's go into that. Let's start
with the whole dashboard. We're going to remove it,
and then we're going to go and select the nav,
remove it as well. None, and to that overview. None to the next one.
To the last one. It's none. With
that, we don't have any background color
for the containers, but you still see here
gray and that comes from the default color
of the dashboard. If you go to the
format dashboard, you can see, we have
it as a default. This is nice, if you go to
the presentation models, you're going to have
everything as gray. We're going to
leave it as it is, and now we're going to go and
add the background image. We're going to have it as a
floating image to the middle, make sure it is fit and
center and then choose. We're going to go with
the background summary. Now next, we're going
to go and change the size to our dashboard size. And then the
position to be zero. Of course, now we are not seeing anything from the content and that's because the order
of the floating objects. Now as you can see it is on top, so let's go and move it to
the background and with that, we see the background
image of our dashboard. I think it's really nice. Now let's go and
do the same things for the next dashboard. We're going to do
the same things. The whole dashard,
going to be removed, the V be removed, and the list can be removed. With that, we don't have
any background colors. Let's go and add our floating
image for the background. Center fit, and we're
going to have our image. Same things, the size, the height, and the
position to be zero. Now, of course, we are
not seeing anything. We have to go and sort
the floating objects. It's going to be
as a background. All right, so that says, I'm really happy
about the results. Let's go and go to the
presentation models. So, guys, what do you think
we have an amazing dashboard, and this is the power of using the background image
for your dashboards. So we have more way
options to add shadows, rounded edges like here
and some lighting. So let's go and switch it. As you can see,
it looks amazing. All right, my friends. If
you still hear congrats, you have just completed
the table projects from the scratch from
the requirements until having this
amazing dashboard. And with that, you have
experienced all the phases of the table projects that I usually do in my
real word projects. So, friends, I
cannot really stress enough how it's
important to take time planning the
projects before rushing into building the
charts and the dashboards. Without having a clear
plan for the projects, things can lead to chaos. So take your time
planning it step by step. Course, feel free to share your project in any
platform that you prefer. L use it as portfolio for your table public profile
or as well in LinkedIn. And it would be nice
of you if you share and mention my channel
to spread the knowledge. So if you like this project and you want me to make
more content like this, please support the channel by subscribing, liking
and commenting. This really helps with
the YouTube algorithm, and as well, it helps
me to reach the others. And of course,
don't be stranger. You can connect and
follow me in Linked in. So, my friends, nothing
left to say beside. Thank you so much for
watching the tutorial, and I will see you in
the next video. Bye.
210. Congratulations & THANK YOU Video: Hi, I'm very proud of you that you made it until the ends. I hope you enjoyed the journey. And I know it wasn't easy going through all those
complex tutorials, but you made it until the ends. And now I can say that you have learned everything that you need to start doing amazing
projects in Tableau. And as well, you have
learned everything that I know about Tableau and how I usually implement real life projects in Tableau. So now I'm going to ask
you for one more thing. If you found this video helpful and it helped you to start
working with Tableau, I really appreciate
it if you like it and share the content
with the others. And of course, if you have any questions or suggestions for the next topic that you
want me to cover in the future or you want
to give me a feedback, make sure to use
the comment below. Well, nothing left to say. Thank you so much for watching this course and I will see
you in the next course, bye.
211. Advanced SQL | Download SQL Server & SSMS: Hey, friends, so
we're going to go now prepare your PC with
everything that you need in order for you to
start practicing que with me using SQL server. And of course,
everything is for free. So now the first step it does, we're going to go
download and install Microsoft cual server
locally at your PC. Then in the next
step we're gonna go download and install
another software code. SSMS, it is like a client in order to interact
with the SQL server. And of course, after
that's what do we need, we need data. That's why we can go
download and create three different
databases for you to practice advanced topics in SQL. And in the last
step, I'm going to take you into a tour into the new interface of SSMS for you to get familiar with the interface
of the clients. So, guys, let's start
with the first step. We're going to go
download and install Microsoft SQL Server
locally at RBC. So let's go. So what
is an SQL server? SQL Server is a database
management system, where it runs a database, and it stores data as well. So it is basically where
the database lives. In companies, usually they
install SQL server on one of their own prim
services or they use a service from clouds where
it runs and SQL server. And, of course, don't worry, we will not buy any
cloud services or we will not use any
powerful servers. What we're going to
do, and for free, we're going to go download and install SQL server at our PC locally in order to practice Squal. Let's
go and download it. Either go to Google
and search for SQL Server downloads or go to the link in the
description where I've collected all the
links that we need. The first one, we're going to
go to download SQL Server. Let's go and open that. Now we're going to land on
the Microsoft page where we can see the different offering
from Microsoft CL server. Either we have it on the Azure or we can download
it on the premises. But we don't want
those staff just scroll down to see
those two options. The first option
on the left side, we have the developer addition. You will get all the
features and services that Microsoft offers
with the SQL server. It is as well free, but the installation here is
a little bit complicated. But in the second option
on the right side, we have the express edition. The installation
here is going to be really fast and very easy. You will get as well all
the stuff that you need for practicing qu and learn q. Both of the options are free. It's just a matter
of the installation. We will go now for
the express edition. Go and click download now.
It's very small file. So let's go and start it, and now the
installation to start. So we have basic custom
and download media. Download media means download now and later we're going
to do the installation. Custom means we have more control on how to download
and install the stuff. The basic is the easiest
one and the quickest one. Let's go with the basics
and click on that. Let's go and accept
all those stuff. Now, let's click on Install. Now we're going to
install the applications, drivers, and so on. It may take a little bit time. All right, so when that we
are done with the first step, we have downloaded installed
SQL Server locally at OBC. So now everything
up and running. Let's move to the next step where we're going
to go and download SQL Server Management
Studio, SSMS. It is a graphical interface
where you can go and start interacting with the database
where you can see the data, write queries, solve
tasks, and so on. So in order to do
that, let's go and click on Install SSMS. Let's click on thats.
You can find, of course, this link as well with the other links that you have collected. So now we are again
at Microsoft's page. Let's go scroll down. And now we will see
the following link, free download, Cal Server
Management Studio SSMS. Let's go and click on that. And then it's going to
go and download it. Let's go and start
it. The first thing that we have to
define the location. I will go with the
default stuff. Let's click on Install. Okay, set up completed. We just installed SMS. Let's go and close it. Now let's go and start it if you go to your
menu over here, search for SQL Server and
you will find it here, squal Server Management Studio. Let's go and start it. Okay, now we're going to get this window in order to connect
to our server. Again, what is our server? It is the one we
have installed at the first step, SQL
Server Express. That's why you're
going to see in the server name your PC name, of course, it's not
going to be MPC name. But here we have something
called SQL Express. This is the server
we just installed. In the first option, we
have database engines, we have reporting services. Those are different
stuff from Microsoft. We're going to leave it
as a database engine, and it should be like
this SQL Express. Now, how to access
this database. We have the following stuff. We can do that using the window authentications or scale saver authentications. I'm going to say
that. Let's stick with the window authentication. The user name going
to be the PC name and as well the Window user. If you don't have
it for some reason, those informations, you can go to your search search for CMD. Then here you can say, who am I. With that, you will get
the PC name and as well, the user that you are
currently locked in. And this is exactly what
I'm seeing over here. So we will not change anything. Let's go and hit Connect. Perfect. Very nice. I didn't get an error, if
you have the same. That means now we are connected
to our squeal server.
212. Advanced SQL | Create Databases: Okay. So with that we are
done with the second step where we have downloaded
and installed SSMS. So we have all the softwares
now running at our PC. In the next step, we're
going to go and get data. So we're going to
go download and restore three
different databases. Three, we have different
sources for the databases, one that I have prepared, and another one from Microsoft. So, the one that
I've prepared is very simple database with
few records for the sales, and I made it in order
to practice SQL. So let's go and download it. Let's just click on the
download course data. And below that, we have the
data model of the course. So let's go and
click on this link. And what we can see over here is the data model
of the database. As you can see it
is very simple. Those are the tables and the
relationship between them. So it's very classic
we have in the middle, the central table, very
important one, the orders, the left and right,
few tables like the broad act customers
and employees, and all of them have a
relationship to the table orders. So as you can see, it's
very simple database. Let's go to the next link, where we're going to go now and download the databases
from Microsoft. Let's download project data. Here we have again, a
Microsoft page where it says adventure works
simple databases. Let me just scroll down. As you can see here, we have
three types of databases. We have ATP, Datawarehouse, and lightweights,
and you can see the last version of each type. Now, let me just explain for you quickly what is ATP
and Datawarehus. What is LTP, OTP stands for
online transactional system. It is classic if you
go to any company, you're going to find there
few operational databases where they deal with day to day business
and transactions. It is a traditional
operational database that you can find
it everywhere in each company that is optimized to do read
and write requests. But in the other hand,
we have another type of databases called data
warehouses or OAB. What is O? OAB stands for
online analytical processing. These type of databases, they are optimized
in order to handle large amounts of data in
order to do data analytics, business intelligence, maybe to build reports, dashboards, and usually they
contain data model that contains
dimensions and facts. They form something
like this, a cube. This cube can help
you in order to do analytics to slice data to
filter the data and so on. Now let's go and download them. Let's click on the
LTB adventure work, and as well for the
data warehouse. I would say let's
download both of them. Now we have several databases
in our download folder. Let's go over there,
and we can see we have the two adventure works from Microsoft and the one zip file that we have just downloaded. This is the simple database
that I've created. Let me just extract it first
in order to get the file. Let's just get the file. Over here. Now we have the three databases and they all end with the
same format, PAK. This format, the PAK
stands for backup. That means we have a backup of the databases and we have to go and restore them in our server. Or let's say install them. In order to do that, we have
to go to a specific folder. We need the path for
that. I've prepared that as well in the link. Just go and copy this path, and let's go back
to our explorer. Let's just go over there. You can see, we
don't have any paps. Now we're going to do,
we're going to go and copy those files
inside this path. If I just go back, let's go copy and go to the path
and just paste them. Great. Now we have the
files in the correct place. If the path didn't work for you, maybe you have
different version of the SQL Express like I have. Make sure to go to BrogramFiles, then Microsoft SQL server, then check the SQL Express, then MS SQL, and
then to the backup. Should have something
very similar for it. Now let's go back to the SSMS
and restore the databases. Let's open again
our application. As you can see, we have
the server and inside it, we have the databases. Let's go to the
databases inside it, we don't find anything yet. What we're going to
do? We're going to click on the databases. Write a click on it, and we will go and restore the
three databases, but we have to do it one by one. Let me show you the steps. Click on restore databases. Here we have the sources. We're going to go
to that device, go and select a device, and then we're going
to go to this pattern, the three dots. Click on that. After that, we're going
to go click on At. As you can see now, we can
see the three databases. Let's go with the first one. Click then again, now we
have the database over here. Let's go and hit Okay. So now we are restoring or installing the database
is successful. So if you click over here, you can see we have
now a new database called Adventure Works, 2022. This is the OLTP. Okay, so now we have to
get the other databases. Let's keep doing the same stuff database,
restore database. I'm just going to do it quickly. Devise three points, add, and then the TW or
the data house. Okay. So successful and we got now our second database on the left side. You
can see it over here. Let's go and import or
restore the last one, the one that I've
prepared, the simple one. So add sales database. And one more okay. So now we have on the left
side three databases.
213. Advanced SQL | Tour in the Interface of SSMS: All right, friends. With that, we are done with a third
step. We have now data. We have databases
in order to start now selecting and
querying the data. We have the application,
we have the data. Now what we're going to do?
I'm going to take you in very quick tour into the
interface of the client, the SSMS. Let's go. Now in order to see and check the data, it's like hierarchy. If you go to the sales DB, let's go inside
it, and now we can find a lot of stuff like
tables, views and so on. The main one going
to be the tables. Let's go inside the tables. And here, you can
find our tables, the customers, employees,
orders, and so on. Now in order to see the data, go for example, to the
orders, write a click on it. Here we have different stuff. What we're going to do,
we're going to go and say, select 2000 raws. Let's click on that. Great
final we can see some data. As you can see, we have
over here query editor. You're going to write
your query over here, you select statements, and then we have here
the result grids. What we're going to do,
we're going to write over here the query for example, let me just remove a few stuff. And then once we are done with the query, we have
to execute it. In order to do that,
we can go over here and click
Execute, very simple. As you can see, que
execute the query, and we're going to get the new result here
in the result grid. Let's say that you need to write another query in order
to make a new tab. What you're going to do,
you're going to go over here to a new query. And with that, we're
going to get clean a new window in order
to write our q. One more thing that is very
important to understand, especially if you have
multiple databases in silo server that you select the correct
database in your query. For example, over
here, if we go, you can see that
we are selecting currently the sales DB database. Now, anything that's
I'm querying now, It should be a table
inside this database. So customers, let's execute. Now we are selecting a table
that is inside the sales DB. Now if you want to select a table which comes
from other database, make sure to switch
the databases. Let's go over here
and switch it to, for example, adventure works. Now, if I go and execute this, it will says, In this database, I don't find the table. So if you are confused and say, I can see the sales
customers over here, and I'm still getting the or from a scale that
it's not finding it. It's because you are
selecting the wrong database. Now, what happened if
you want to work with multiple databases
in the same query, what you can do, you can
define it at a starts. So you can say sales DB, dots, sales dot customers. That means we have hierarchy. Here we have a database, then the schema,
then the table name. Now if I go and execute this, even though that a
different database, it's going to understand
this table comes from other database and we
will get the results. That means in one query, you can query multiple tables
from multiple databases. Either you can go
and switch it from here or you can use
these statements. I can say Use sales DB And
with that, I'm telling SQL. Now use this database instead of the other
as you can see, Q going to go and switch it. Now since I am
inside the database, it makes no sense to tell SQL
again about the database. I just go and remove
it It's going to work. All right, so that we have
prepared your environment, you have everything to start
doing amazing work in SQL. Now I would say, just go and
explore the other databases, just do random
selects in order to understand what do we have from content inside those databases. And if you would like to see the data model of the
adventure work, I have it as well as the link. Over here, if you go to the
Data Warehouse data model, you can see over here, all tables that are available. And you can see, we
have a lot of tables. So since it's Data Warehouse, you have dimensions and facts. And as well for the
OLTB I have it for you. If you click over
here, you will find a huge operational database
with a lot of stuff. So here they market
with sales, persons, products, purchases and so
on. All right, friends. So with that, we have
prepared to PC with everything that
you need in order to start practicing SQL. So we have the SQL server, the client SMS, the data,
the three databases. And now you are ready
to practice with me advanced topics in SQL. And now in the next chapter, we're going to deep
dive into the word of window functions in SQL. They are the most important
group of functions that you need for data analyses. So I really want you
to focus on this. You can end up using
those functions in real projects. I promise you.
214. Advanced SQL | What are Window Functions: Window functions or sometimes we call them
analytical functions. They are very important
functions in SQL. Everyone must know
them, especially if you are doing data analyses. Each time I write SQL script in order to do data analytics, I end up using them. As usual, we're
going to go and now understand the
concept behind them, and then we're going
to start practicing. Let's go. Okay, guys. Now let's start with
the first question. What are SQL window functions? They are functions
that allow you to do calculations
like aggregations, but on top of subset of data without losing the level
of details of the rows. It is something very
similar to the group. But here we have special case, you don't lose the
level of details. Now iner to understand
the definition, let's have a very
simple example. Okay. So now let's
understand how SQL works with the
group by Clous. Let's say that we have
a very simple example. We have four orders, two orders for the caps and
two orders for the gloves. Let's say that, I
would like to see the total sales
for each products. Now if we decided to use the group by, what
SQL going to do? Going to take the
first two orders for the caps and
put it in one row. In the output,
we're going to have only one row for the caps. With the total sales of 40. And the same thing can
happen for the gloves. I'm going to take the two rows of the gloves from the input, and in the output,
we're going to have only one row
for the gloves. That means the number
of rows is going to be depending on the number of
products we have on our data. We have two products,
we get two rows. That means SQL is
really like smashing or squeezing the
results in the outputs. And this is exactly what the
group by does to our data. It aggregate the rows, aggregate the data into
different level of details. Now on the left side, we see four rows on the
right side we have two rows, and with that, we are losing
some details in the results, but still we have
solved the tasks. So now let's see
what can happen if you use window
function in squal. Okay, so now we
have the same data, and as with the same task, we have to find the total
sales for each product. Now, if you use window function, qual going to do the following, it's going to go and execute each row individually
from each other's. So what can happen, it
starts with the first row, the order ID one, In the output, we're going to get as
well the same stuff, the order ID one, the same row, but we will get the total sales for the caps. Here the total sales
is going to be ten plus 30, we will get 40. Then it's going to
jump to the second row and I'm going to
process it as well. In the output, we will
get the order ID two, the brodat caps, and as well, we have the same aggregation
since we are talking about the same product.
We will go 40. Then it's going to
go to the third order and here we
have the gloves. In the output, again,
we have the order ID three, the product gloves, and the total sales
this time going to be five plus 20,
so we'll get 25. Then it goes to the last row to the outer ID number
four, in the output, we're going to get four
gloves and as well, 25. Now we can notice that. If you use the window function, you will not lose the level
of details of your data. So we are doing something
called row level calculations. So if in input data, we have four orders
in the output, we're going to get four
orders and as well, we will get our
aggregations correctly. Now if you compare both of
the methods, side by side, we can see that we are
solving the same task. So we are finding the total
sales for each product, but with the group,
we are smashing, squeezing the results from
four orders into two rows, one row for each order. That means with the group,
the granularity is changing. In the input, the order ID is controlling the
level of details, but in the output of the group, the product is controlling
the level of detail. So we have different
granularity. But in the other hand in
the window functions, we are still able
to do aggregations, but we are not losing
the level of details, the granularity of the input can be the same like the
output in the results. This is exactly the
main difference between the group Pi and
the window function. If you want just to do
simple aggregations, then go with the group Pi. But if you care about the
level of details and you need to add more details
to your results, then you can go with the window
function where you can do aggregations plus
having more details. Now, if you go and
compare the functions between the window
and the group Pi, We can find that
both of them has exactly the same functions
for the aggregations. We have the count some
average mean max. Here comes another difference between the window
and the group i. The group I has only the
aggregate functions. That's it. But in the window functions, we have way more functions
to use for analytics. For example, we have
the ranking functions, and we have here another
group of functions for the value or we call
it analytical functions. That means in the qual window, we have a lot of functions. We can cover a lot of analytical use cases and
advance complex stuff. But with the group, we have only the aggregate functions
only for simple use cases. This is another
difference between the group i and the window. Group use it if you have
simple, simple aggregations, Window functions,
we can use it for more advanced data analysis where we can cover
a lot of use cases. All right, now
we're going to have few tasks in order to
understand one thing, why do we need scale
window functions? Why in some scenarios, group is not enough
and we have to use scale window
functions. Let's go. All right, so let's start
with very simple task. It's going to say, find the total sales
across all orders. So we need one value
with the total sales. Let's say we can do that. First, make sure that you
are using the database. So use sales database in case you have
closed the clients. So that's we don't
get any errors. So now we're going to start
with the first thing. We're going to go and
select the sales. You're going to find it in
the table sales orders. So now let's just
query the data. And as you can see, we have
ten orders with ten sales. We didn't aggregate
anything yet. So we have the raw data now. So now in order to
solve the task, we're going to use the function. So some of sales, and we're going to give it
a new name, total sales. We don't have to use any group I because we don't have
to group up anything. So that says, Let's
go and execute that. And as you can see, QL going
to return one value, 380. This is the total sales that
we have inside of our data, and this is the highest
level of aggregations. So with that, we have
solved the task, we have the total sales. Across all orders, we don't
have to group up anything. Let's move to the next example. Let's say that in the next task. This time we want to
find the total sales, but for each products, not for the all orders. For each products, we want
to find the total sales. This time we don't
need only one value. We need one value
for each products. In order to do that,
now we're going to go and use the group I function, and we're going to group
up by the product ID. Group up need as the
dimension in the selection. We can do it like this. That says, Let's go
and execute the query. Now, as you can see
in the results, we don't have one
value, we don't have the highest aggregations. This time we are drilling down to the next
level of details. The level of details
here is the product ID. We have one row
for each product. For the first
product, we have 140, the next one, 105, and so on. As you can see, we
are now splitting the data at the
level of product ID. We went from ten orders. Now in the results, we have four orders, and
that's because we have four products. So the number
of roads at the output, going to be defined by the
dimension, the product ID. And with that, we
have solved the task, we have the total sales for each product.
All right, guys. So let's keep progressing
our examples. Now the next one going
to be a little bit advanced where we have
the same aggregation. Find the total sales
for each product, Additionally, provide details such order ID
and the order date. As you can see, we have
already solved the first part. We are finding the total
sales for each product. Now we just have to add some additional information like the order ID and the order date. Let's go over here and
just add it in our select. Order ID, let's have
the order date. Let's go and execute that. Just going to make it a
little bit bigger. Let's go. But now, as you can
see, SQL will not be happy and throw an
error and says, the stuff that you are
adding to your select, are not included in the group. As you can see in the group i, we have only one dimension or one field called the product ID. But in our selection, we have three dimensions, the order ID, the order
date, and the product ID. So there is no matching
between the select and group i and SQL
will not allow it. Now you might say,
You know what? Let's add everything
to the group. With that, we're going to get our aggregation, and as well, we're going to get our
details. Let's try that. I'm just going to zoom
out a little bit. Instead of having the product
ID, let's add everything. The order ID, order dates, and the product ID. Now we have matching and scale should not through any error. Let's go and execute it. Now let's check whether
we have solved the task. The task has two parts rights. We have to do the aggregations
and to provide details. You can see we have
solved the second part. We have the details,
or ID and or dates. But now, the first part
finding the total sales for each product is destroyed because if you
check the results, we have the product ID 101, it has the total sales of ten But in the third order, we have it as a 20
for the same product. So actually, the data
is not aggregated. And that's because we are aggregating at different levels, and we have included
way more stuff that we don't need
for the aggregations. We are aggregating at
the order ID level. So as you can see now, we are hitting the limits of groupi. We cannot provide
aggregations and as well provide additional
information from our data. You have to pick one.
That's why we have to go to the second option where we can use the window functions. So do that. I'm just
going to get rid of the group parts and as well all the fields.
Let's pack to the root. Now we have the sum of sales,
and if you execute this, I'm going to get to one value, so we are at the highest
level of aggregations. Now we need to use
the window function. I'm just going to
remove the name, and now we're going to tell SQL. This is a window functions. Using over after
the aggregations or the functions tells SQL, we are talking about
window functions. Let's just execute it like
this and with that, we got ten rows, and that's
because we have ten orders, and for each row, we have
exactly the same value. We have the total sales of
all orders for each row. As you can see,
Scale understand, this is a window function, and Scale should not like
group all the data in one row. It should keep
exactly the same rows or same number of
rows like the input. With that, we have
the window function, but we have to split the
data by the products. Now we're going to use
the keyword partition by. It's like the group by,
by another wording. Product ID, the same dimension. With that, we have the total
sales by products as a name. Let's go and execute this. Now as you can see
in the output, we still have the
same number of rows. We have ten orders,
we have ten rows. But the result did change
because now we are aggregating the data at
the level of product ID. In order to understand
the results, we have to add more
information to our select. Now let's add the
same dimension. It can be the product ID. I'm just going to add it
at the front over here. Let's select and as you can see. Now it makes more sense. We have those products and they have always the exact same sales and as well for the
next product and so on. Now here comes the magic
of the window function. We can add more information to our select statement
without having any errors. Now we need additional
information like the order ID. We can go over here and say, order ID, order date, any type of column, you can add it to your select, and let's go and execute. Can see now we get the result, even though those
three dimensions in the select are not part of
the window aggregation. With that, we have
solved the tasks. We have additional information, we have the order
ID, the order dates, and as well, the first part of the task to find the total
sales for each products. Each of those values are the total sales
for each products. And with that, we have
solved the tasks. And this is exactly why
we need window functions. In real projects, things
get really complicated. You are doing different
tasks in one query. So you are doing aggregations. You are doing some other stuff. So just focusing on the aggregations is not
going to be enough. You have always
to add additional informations to your query. As you can see, we use group
Pi to do symbol analyses, but as things get complicated
in the analytics, we use the window
functions in order to show the aggregations and as well
add additional information. As you can see, we use groupi
to do symbol analysis, but as things get complicated
in the analytics, we use the window
functions in order to show the aggregations and as well
add additional information.
215. Advanced SQL | Syntax of Window Functions: All right, so we're going
to go and d dive into the syntax of the SQL
window functions. We're going to cover
everything, each part of the syntax for you to
understand how to use them. Let's go. All right. Let's
start first by understanding the basic components or the basic parts of
each window syntax. Mainly, we have two parts. The first part is going to
be the window function. We have average and so on. The second main part is
going to be the over clause. Inside the over clause, we have three different parts. The first one going to be the partition clause, the second order clause and the last one,
we have the frame clause. Those are all
components that you can use inside the window function. Two main parts, window
function and the offer clause, and inside the over, we have
partition order and framing. Let's go more in details. For example, we have the
following window function. You can see we have a lot
of stuff going on here. We're going to
understand them step by step component by component. Let's start from the
left from the first one. What do we have over
here, we have a function. Window function. What
is a window function? Like here, we have the average. It's like any other function
in squa L. You can use it in order to do calculations
on top of the window. The first thing to
do or to define in a window is to define the
function of the window. As we learned before,
we have a long list of many window functions
available in sql, and we group them
into three groups. The first one, we have
the aggregate functions, we have the count average
maximin A those functions, we have them as well
for the group by. Those are used for
the aggregations. The second group of functions, we have the ranking functions. We have the row number
rank entile and so on. We can use those groups in order to give a rank for our data. The last group, we call it value or sometimes
analytics functions. Here we have very
important functions like the lead lag, first value, and the last value in order
to access a specific value. Of course, we're
going to go and learn all of them one by one, understanding the
concepts, some examples, and as well for
you to understand when to use them
for data analysis. Now let's keep
moving understanding the other parts of
the window syntax. Now, inside the
function average, we have here a field name or
column name called sales. This is called a
function expression. It's like a value parameter
argument that we can pass it the function. Here we can use
multiple different stuff. For example, depend of
the function, of course. Here, it could be empty
like here in the ranking. It doesn't allow to
use an expression, so it should be always empty. Or we can use a column in the
example. We use the sales. We use the column name as an argument or an
expression for the average, we are finding the
average of sales, or we could use a number. Here in the intel, we are
allowed only to use numbers, or we could have multiple stuff. For example, in the lead, we can have sales, the
numbers, and so on. Things get complicated.
Don't worry about it. I'm going to explain that.
Here we have multiple stuff. Or we can have a whole
conditional logic. For example, here
we have the win, so on inside the sum. The whole thing over here calls an expression for the sum. As you can see, we can build
here a complex logic and the output of this logic can be passed to the function sum. That means as an expression
for the function, we can use different stuff. Of course, depends whether the
function allows it or not. Now, let's have a quick overview in order to understand which data types are allowed in the expressions for
those functions. Let's see the
aggregate functions. As you can see the count
function except any data type. But the others like the
sum average main mark, they allow only
numerical data types. Now let's move to
the rank function. The expressions are pretty easy. It should be empty.
It doesn't allow any argument or anything
inside those functions. As you can see, all
of them are empty, but only one that accept
numerical values, which is the tile. You have to define
a numeric value. Now moving on to the last type, we have the value functions. They accept an data types
inside the expressions. As you can see,
each functions has its own specifications
and you have to be careful which data type you
are using in the expressions. So now let's keep
moving to the next one. We have a very important
part in the window syntax. So far, what do we have? We have a function, we have
an expression. It's like usual stuff. We have done that before
using the group by. Now we have to tell SQL that we are dealing with
the window function. It's not a normal one.
In order to do that, we have to specify the keyword. Over. The second main
part in the syntax is the over clause and we use it in order to define a window. Inside it, we can
define multiple stuff like the partition pi, the order by the frame. But all though
stuff are optional, we can skip it and
leave it empty. The main task of the over, it tells first SQL, we are dealing with the window
function here and as well, you can use it in order to
define a window of your data. Now we're going to go
and cover everything inside the over clause, and we're going to
start with the first one, the partition Pi.
216. Advanced SQL | Window Functions: PARTITION BY: All right. Now we're going
to learn how to define a window inside the over clause. The first part that we can
define is the partition Pi. For example, here, we have
partition Pi category, we have to define
that dimension. It's very similar to
the group and wording. The first part going to
be the partition clause. What is going to do,
it's going to divide the entire data sets into groups or you can call
it windows partitions. Here we tell how to
divide our data. Here we have two options.
Let me just show you. If we don't use anything, so we have it empty. You see over and
partition by is not used, what can happen is QL Use the entire data in order
to do the calculations. The whole data, the entire data can be counted as one window. We are telling SQL, don't divide anything,
leave it as it is. The second option
that we have is to divide the data by partition Pi. We define the window like
this partition Pi products, for example, SQL going to go and divide the entire data
into different windows. For example, here, two windows. Here, this time,
the calculation, the sum of sales will not
apply on the entire data set. This time, it's
going to be applied on the different
windows individually. We're going to find
the sum of sales for Window one separately from the
total sales of window two. All right. So now we have
this very simple example. We have here three fields, the month product sales. They are really
easy informations. And now we have the following
SQL window function. So we have some of sales, and inside the over close, we are not using anything. So we are not using
partition pi. So how ICL going to
define the window now? Q going to say, I don't
have to divide anything. The entire dataset
is one window. So SQL going to go
over here and say, The whole thing is one window.
There is no partitions. There is nothing. We
have only one window. The entire data going
to be aggregated. This is what happens if
you don't use partition by and you leave the
over clause empty. The entire data is one window. All right. Now let's move
to the next example. We don't want to have
only one window. We would like to have
multiple windows, so we have to divide
the data by something. In the overclause
we're going to define the window like the following
partition by month. It's not empty. We are
now dividing the data. By the field month. The values inside this column going to difide the data sets. Here we have two months,
January and February. What's going to do is
QL going to go and divide data into two sets. The first window going to
be this one of January. We have the first window, going to make it smaller and the second window going to be the February. It's going to be two windows inside our
data and the calculation going to be happening on
each window separately. So here, as you can see, we
are using the month in order to divide our data
sets into two windows, one window for January and another window
for the February. So now let's have a
quick overview about the options that we have
with the partition by. The first option as we learned, We can just skip it. Without partition by,
for example here, total sales across all rows, and here we don't find
anything inside the SQL. The second option, we can
use one field, one column. For example, partition
by products. We are using one dimension, but we can go and mix stuff. We can use multiple columns or multiple dimensions
in the partition by, for example, here, partition
by product and other status. Here with the partition by,
we can define a list of dimensions that could be used in order to
divide our data. In this example, we are saying, find the total sales for each coination of products
and order status. Those are the different options on how to work with
the partition by. Now let's have this
overview again. For all functions, the partition by for all those
functions is optional. If you don't use the partition
pi in all those functions, you will not get any errors. Now let's go back to
scale in order to start practicing
with this clause. Now we have the following task. Find the total sales
across all orders, and we have to provide
additional information like the order ID
and the order date. Let's go and seve
it step by step. First, I would like to
provide the details. I'm going to select the
order ID and the order. Dates from the
table, sales orders. Next, we're going to work
with the aggregations. We need to find the total
sales across all orders. Again, since we have here
details and aggregations, we cannot use Ruby, we have to use the
window function. So we're going to go use
the function sum for sales, and now we have to tell SQL, we are working with
window functions. That's why we're going
to use the over close. Now the next day
we have to think about defining the window. Let's check the task. It says, total sales across all orders. So that means we don't
have to partition or divide the data sets into
chunks or partitions. We have to leave it as it is, like the whole data
going to be one window. That's why we don't
use partition Pi inside that definition. We're
going to leave it empty. Let's go now and give it a name. It's going to be
the total sales. Let's go and execute this. Now at the results,
as you can see, we have all the orders, all the details, and as well, we have the total sales
across all orders. With that, we have
solved the tasks, we have the total sales and as well some details
about the order. Now let's move to the next task. It's going to be very similar. It says, find the total
sales for each product. We have to provide
additional information like the order ID
and the order date. It's going to be
very similar task. But this time, we have to divide the entire
data into windows, and that's going to
be by the product. Since we are saying total
sales for each product. This time we have to go
and divide the data. We're going to define the
window like this partition by and we can use the
dimension product ID. Let's go and execute this. Now you can see in
the total sales, we don't have anymore
the total sales of the whole data,
but they are divided. But in order to
understand the results, let's go and include the
product ID in the results. Product ID and execute. Now by looking to the results, you can see that the data is
divided into four windows. Let's see them. It's going
to be by the product ID. So this dimension going to be
controlling the partition. So the first window going
to be the product ID 101, we have the total sales
for this product 140, and the next window
going to be 102, the third one, 104, and the last window, it's
going to be only one row, the 105 and the
total sales F 60. With that, we have
solved the task, we have the total sales
for each product, and as well we
have some details. Now I would like to show you the dynamic of the
window function. We can add multiple aggregations
on multiple levels. Let me show you what
I mean. Let's say we stay with the same example. But we're going to
find the total sales across all orders and as well, the total sales
for each products. What we can do, we can do the window functions on
different levels by, for example, here removing
the whole definition. Here we have the total sales for the entire data for
the first task, and the next one going to be the total sales but
divided by the product ID. Let's rename it by products. Let's go and execute this. Now you know what, I'm going to go and add the sales as well just to explain the flexibility
of the window function. Let's go add the sales
and execute it again. Now by looking to the results, you can see we have the sales
in formations three times, but with different
granularities. The first sales,
the sales itselves without any aggregations. It is the highest level
of details of the sales, and we're going to have
the sales for each order. The next one, the total sales
with the window function. Here we have the
highest level of aggregation we have the
total sales of all orders. The last one the total
sales by product, it's something like
in the middle. We are aggregating on a window. The window going to
be the product ID. As you can see, we have different granites
of the aggregations, and this is exactly
the flexibility that we have with
the window function. We can do all those
stuff in one query. Now let's keep moving and
adding stuff to our task. It's going to say, find
the total sales for each compination of the
products and the other status. This time, we have to
divide the data not only by the product p as well with another dimension,
the order status. Now let's see how
we can do that. I'm going to just
show the dimension order status and the results. And we're going to add
the following sting. So some sales over since
it's a window function, and let's go now and define
the window. Petition by. So we have, again,
the product ID, but not only this dimension
as well, the order status. And let's go and call it
sales by products and status. Let me just rename those stuff. Okay. Let's go and
execute. All right. So now let's check the results. It is the last
aggregation over here. As you can see here
the aggregation has different granularities
as the previous one, and we have more details. This time we are splitting
the data by two dimensions. The first window going to be the product ID with
the order status, it's going to be
only those two rows. We have the order ID 101 and
the order status delivered. The total sales of this
going to be ten plus 20, and we're going to have 30. The next window going
to be the same product, but with different status. It's going to be
the 101, shaped, and we're going to
go and summarize those two values and
we're going to have 110. The next product and
order startle is going to be the 102, and
we have it only one. 102 delivered, it's only one. It's going to be the same value. The next partition or window, it's going to be two rows, 102 with the shaft, it's going to be
those two things, 60 plus, 15, we're
going to have it. 75. As you can see, here's the product ID
and the or status, they are controlling how many
windows we're going to get. We get here around
like six windows. With the product ID, we
got only four windows and without using anything
inside the overclouse, we will get only one window. This is how the
partition by works.
217. Advanced SQL | Window Functions: Order BY: All right. That was
the first part of the window definition
within the over clause. Let's move to the next part. We have the order
by. For example, we can use order by order date. It's just a field. The
order clause is very important in order to sort
your data within a window. The order by is very important as well for many functions. By just checking the
overview over here, for the aggregate functions, it's optional, so you could
just leave it or add it. But for the rank function and as well for the
value functions, they are a must. If you want
to use those functions, you must use the order clause
because it makes no sense, for example, if you are ranking the data without sorting
your data first. Okay, guys, now back to
our very simple example, and we have the following query. The function this time
is going to be rank, so we have to rank the data and the definition of
the window going to be partition by month. That means we divide
the data by the months, so we have it over here, and then the second
part going to be, order by sales descending. We have to sort each window
by descending order. That means we start
with the highest value and we end up by
the lowest value. Let's see how going to
go and execute this. First, partition by month. It's going to divide
the data into two partitions because we
have two values by the month. Let's see how this is
going to look like. One window for January and
another window for February. All right. Going to
go to the second part and execute order by
sales descending. So what can happen, SQL going to go for each window
separately and start sorting the data
from the highest to the lowest without checking
the other window. So in those three values, the highest one is this one. So it's going to be on
top. Let me just sort it. This is going to be the lowest. You're going to
be in the middle. So Q going to sort this window separately
from the next one. And then once it's done, it's going to go to the second one. So the highest value
going to be this one. You are the lowest. Let me just do it like this. So SQL go to sort it like this. The highest one is 70. The next one is 40, and
the last one is five. That is scale done with the
definition of the window. So it's splitted by the month and each window
is sorted by the sales. The next step is Sq going to
go and rank those values. So it's really simple
in the outputs. It's going to rank
the data like this. So the first one going
to be this value. The next one going to be two and the third one
going to be three. So as you can see, qu
sorting only this window, and it's going to go and repeat the same stuff for
the second window. So each rank is separately
from the others. You can see it's very simple.
This is how QL executes partition by together with the order by for
the rank function. Now, let's have a quick
task for the order by. It says, rank each order based on their sales from the
highest to the lowest. We have to provide
additional information like order ID and order date. Let's see how we can
write the query. We have the basic staff order ID order date and the sales, and now we can go and rink the data using window function. We can use the function rank. And then we're
going to tell SQL, this is a window function, and inside it, we have now to provide the definition
of the window. So now by checking the task, you can see that we don't
have to divide the data, so we don't have to
use partition by. We have just to use
rank, and with rank, we have to use the
order by. It is must. So we're going to use
order by the field going to be the sales and from
the highest to the lowest. So just call it rank sales, and let's go and execute this. And as you can see,
our result is going to be sorted from the
highest to the lowest, so you can see the sales 90 at the top and the lowest
going to be the ten. And as well, we have a rank. So for the top rank, it's going to be one, and the lowest rank
going to be ten. As you can see, we just
quickly create a rank in SQL. It's very simple.
The whole thing is one window since we are
not using partition pi. Of course, if you want to have ascending from the
lowest to the highest, you can just remove it because optionally going
to be ascending. Let's go and execute the query. So now we can see the orders
are sorted the way around, so we start with the lowest
and end up with the highest. Of course, we're going to get
the same results if you go over here and add ascending. If you excute see we've got
exactly the same results. This is how you use the order pi inside the window definition.
218. 3 5 window frame: Okay, guys. With that,
you have covered the second part of the
window definition. Now we're going to
go to the last part to the most advanced
part of window, and we have the following stuff. We have rows
unbounded proceeding. We call this frame
clause or window frame. What we are doing
over here that we are defining a subset of rows within each window that is relevant for
the calculation. Totally understand if this is confusing at the
start or complex, it was for me as well. What we're going to do, we're
going to deep dive into the concept in order to
understand how this works, and we're going to do it step by step, so don't worry about it. All right. So now
let's understand what is going on with
the frame clause. From the basics. Now if you do aggregations and you don't
use window function, you're going to consider
the entire data, or rows inside the table. But what we can do,
we can go and divide the data using partition
Pi to a window. For example, here, we have
window one and window two. Now, if you go and
do aggregations, all the rows in the window one
going to be aggregated and then scale can go to that window two and
aggregate all the rows. What we can do in scale that
we can say, You know what? I don't want all rows
inside the window, I want a subset of rows
inside the window. What we are doing
over here is that, we have those two windows, but we specify scobe or we
specify subset of data. From each window to be
involved in the aggregations. Of course, not
only aggregations, we can do ranking other stuff. So I mean, calculations. So here like we have a
window inside a window. So we are defining
scope of rows. Not all rows should be
involved in the calculation, but only specific
subset of data. And we can do that
using the frame clause. So again, the partition by, you can use it in
order to divide the entire data sets
into multiple windows. And now for the frame clause, if you don't want to consider all the rows within each
window in the calculation, you want to focus and specify only a subset of data
within each window, then you can go and
use the frame clause. All right. So now let's go and understand the syntax
of the frame clause. Let's have the
following example. We are saying the
window function is the average of sales, and then we define the window. So we have the first part,
partition by categories, order by order dates, and then we have
the frame clause. It's going to be the
following rows between current row and
unbounded preceding. This is the frame types, and we have two types, we
have the rows and groups. Then we have between
and the range. So the first range
is going to be the frame boundary, lower value, and here it accepts three
types of keywords like the current row or a number of proceeding or the
unbounded proceeding. Then we have another
frame boundary. It's going to be
the higher values, and it accepts the
following stuff. We can use the current row in following or
unbounded following. As you can see, we are
defining like boundary or a range from low value
to higher value. Now we have some
rules. We cannot use the frame clause
without order by, order by must exist in
the definition in order to use Frame clause and
the second rule says, lower boundary must be
before the higher boundary. So always we start with the lower boundary and we end up having the
higher boundary. You cannot switch that. Okay, so now we have a very
simple example. We have the month
and the sales and the following query,
sum of sales. This is the window function, and the definition of the window going to be order by month. We are not using partition by just in order to make
our life easier. And the frame cloth is going
to be defined like this, rows between current row
and the two following. Now let's see how Equal
can execute this. The first definition
order by month, as you can see, the months
are sorted already. Now qual going to work
with the frame definition, current row and
the two following. Sq going to process
this row by row. So it's going to start
with the first row, and it's going to be our current
row as here in the qual. So this is our current row, and we say the range until
two rows, two following rows. So it's going to be
February and March. That means the pointer is going to be over here
for the two following. With that, we have
the frame boundaries, and I scale have the following
scope for the first row. We have three rows and
the summarization of those three rows
can be around 70. We'll get for the first row 70 because the scope
is not all rows, but only the subset of data. With that scale is done
with the first row, it's going to jump
to the second row. The pointer going to be the
current row at the February, and the second two following
going to be at April. So with that, as you
can see, we are sliding down in the subset of
data or in the window. And with that, we have a
new scope, a new subset, and the summarization of all
those values going to be 45. So that's set. I think
you get it already. It's going to go
to the next one, the pointer going
to be on March, and the two following
going to be on June, and it's going to
slide like this. We have those three
roads in the scope, and the summarization of
that is going to be 105. Now, things get interesting
for the next row. So the pointer for the current
row going to be april, but the two following
going to be like after the end of the
table or something like that. So as we slide down,
as you can see, the scope now or the subset
of the frame going to be only two rows and the
output going to be 75. And finally, if you
go to the last row, it's going to be the current
row and we're going to have only one row for
the subsets because the two following is just
outside of the table, and we're going to
get the same value as the summarization. As you can see, that's
it's very simple right. The frame we use it in order to scope which rows are involved
in the calculations. What do you have
to do is to define the boundaries of the frame, the lower and the
upper boundary. Let's see what other options
do we have with the frames. Here we have the same example, but we redefine the boundaries
of the frame like this. Rows between current row. This is the first boundary
and unbounded following. This means that we
are targeting always the last record in the
window or in the table. Unbounded following is going
to be always static and it's going to be in this
example, pointing to June. I was going to go row by
row and the current road going to be like the start
January and then February. I'm just going to
take this example. The pointer is on February, and the subsets or the frame
going to be those four rows. So it can be February,
March, April, June. So it can be four rows, and the total aggregation
of that is going to be 115. You can do it like this and previously it was
flexible, more flexible. It was two following. But this time we have
unbounded following. That means always the boundary
going to be the last one. As we are moving with
the records over here, The boundary is
going to be smaller, smaller and like this, and the last one, they're going to be both in the same record. The current record
is going to be as well, the unbounded following. Let's see the next
one. The definition of the window going
to be the following rose between one proceeding
and the current row. Here is the way around. One proceeding is lower
than the current row. Let's see how SQL
going to execute this. Let's say that we are
currently at March. This is the current row, and we are saying
between one proceeding. That means one row
before the current row. So the frame going
to be like this, and we have only two rows. So the value going to
be the summarization of those two rows and
it's going to be 40. That means we are always targeting the rows
before the current row. Okay, now let's keep going with the other options in order to understand everything
about the frame. So we redefine like this rows between unbounded preceding
and the current row. So unbounded
preceding going to be the first row in the
table or in the window. So it's going to be
static like this. It's going to be the
first one January. Let's say that we are at
this current row in March. The window or the subset
going to look like this. Though three rows and the total of that
is going to be 60. Now as a scale is
proceeding to the next one, it's going to fix
the first boundary. It's going to be
always pointing to January and the
subset going to be a little bit bigger until we reach the last one
and with that, we're going to have the
subsets, the whole rows. With that, we get really great flexibility on how to define the subset and how the subset is shifting through the window. Okay, now we are
just having fun, so we're just playing
around with the boundaries. We don't have always to
use the current row. So we can use, for example,
here in this definition, rows between one proceeding
and one following. So we don't include at all the current row
in the boundaries. So let's say again, our current row going
to be in March. So one proceeding going to be February and one following
going to be April. So with that, our frame
going to be tho three rows, let me get it This and the aggregation of this
going to be around 45. So with that, as you can
see, the boundaries is going to be one proceeding
and one following, so it should not be
always the current row. Alright, so now I think
you already get it, what's going to be
the last option. We're gonna have everything. So the definition of the frame
going to be rows between unbounded proceeding
and unbounded following what we're
going to have over here. The unbounded proceeding
go to be January, and the unbounded following
going to be June. And now the frame going to
be everything, all the rows. And it doesn't matter where
are we with the current row? It's going to be always
a fixed subsets. So it's going to be
always everything. So if we are over here
or February or March, we're going to be
considering all rows. And the total sales of
that is going to be 135. So we will get the
exact same results for everything for all rows. So with that, I think it's
not that complicated right. We just have to provide
the boundaries, and then the
calculation going to be depending on the frame
on the subset of data. Okay, guys, now let's
go back to scale and start practicing in order to understand how
the frame work. So let's go and define
a window like this. Some of sales and the window
definition like this, we're going to divide the
data by order status, and let's say we're going
to sort it by order date. Let's define a frame like this. Rose between current
troll and two following. Let's give it a
name, total sales. Let's go and execute it. Now let's look to the data. You see that scale can divide our results into two sections, two windows delivered
and shaped. You can see that the data is
sorted by the order date. As you can see over here,
for example, on this, status delivered,
we can see that first of January 10 and so on. Then the third part, we have defined a frame in each window. So for example, let's
take the first one. This is the current row. We say the frame is between the current row and
the two following orders. That means the scope
going to be like this. Ten plus 20, 25, it can be 55. Now what is interesting
as well to check here is the last
record of each window. Now let's take this
window over here and the last record going to be
number seven, this order. And let's say this is
the current record. We say the frame between current record and
the two following. But since it is the last
record of this window, it will not go and consider
the next two orders because those two orders
are outside of the window, and that's why we
have here 30 and sq didn't go and summarize
all those value. So we have it 30 and there
is nothing after that. That's why we will get 30. As you can see, the frame can be calculated within one window, so it will not consider anything
outside of that window. This is how the frame
works within partitions. Now, I would like to
show you as well, a few stuff about the frames. We can use shortcuts, but we can use them only
with the proceeding. For example, let's say, I'm going to change the
definition like this, two proceedings. And control. Let's go and execute it and
we'll get those results. Now if you want to
check the results quickly, let's
take, for example, this order of our here, and we are always summarizing the values of the
two previous orders. So that means those three
order is going to be involved in the frame and the
output is going to be 55. Now there is a shortcut for SQL, but only for the proceeding where we can remove the range, so we can go and remove everything and we can
leave it like this. Rows two proceeding. If you go and execute it, we will get exact results. This is a quick
way or a shortcut on how to define a window, but it only works
with the proceeding. For example, if I go
over here and say, for example, unbounded,
it's going to work, so we will get the results between the unbounded
proceeding and the current row. But if you go over here and
you say, you know what? Let's have the
unbounded following, IQ going to say there's an error and the same thing if you
remove the unbounded, let's say for example,
one following, IQ will not like it. You can use the shortcut
only with the proceeding. And one last thing about
the frames it does, there is a default frame. If you don't use any frame
and you use order by, what can happen, qual going
to use a default frame. If you check the
result, you will notice that for this
window over here. Those values are not like the
whole values of the cells. There is frame, there
is hidden frame. The default frame in qual
going to be like this rose between unbounded
preceding. And current row. This is the default frame
if you use order by. Now if you go and
just execute it, you will see that we will
get the exact results. Be careful, once you use order by with the aggregate functions, there will be a hidden frame or a default frame like this, between the unbounded
proceeding and the current row. That means there are
three ways in order to do this scenario framework between embody proceeding
and current row, either write it like
this or you can go and have a
shortcut like this. Let me just execute
it. So we'll get the same result or just
remove it completely. We will get as well,
the same results. Now again, the hidden frame of the default frame is only
working with the order by. So if you go, for example, here, and remove the order by. Let's see the results, the whole window
will be aggregated. So again, let me just select it, so you can see that QL going
to consider all the rows in the aggregations and we will get the total sales for
the whole window, so there will be
no frame defined. Only it can be present once you use order by.
All right, friends. So with the frame close,
we have now covered all the components on how to define a window inside
and over close, and with that we have covered everything about the syntax
of the window functions.
219. 3 6 window Rules: Okay, guys, now we're
going to go and understand the rules or let's say the limitations of window functions. So let's learn what you are not allowed to do while
using window functions. Okay, the first rule of that, you are allowed to use
the window function only in the select clause and as
well in the order by clause. So here we have, again, the
same example where we're finding the total sales
by the order status. So as you can see, we used the window function
in the select clause, and we didn't get
an error right. So now we can go and use it
as well in the order by. So let's say order by, let's go and copy everything,
but not the name. Derby. If I go and execute this, there will be no errors and
Q all going to allow it. As you can see the
result didn't change. Let's go and sort it,
for example, descending. I'm going to right here
descending, and let's execute. Now we have the total sales
with the highest values, then the lowest values. Having this rule that we can use it only in select and order by, that means we cannot use window functions in
order to filter data. Let me show you, for example, instead of order by, let's have ware clause
Were total sales, let's say bigger than 100.
Let's go and execute this. As you can see,
kale going to say, no, you are not
allowed to do that. You can do that only
for select and derby. So we are not allowed
to use it for filtering data using
the were clause. And as well, you are not
allowed to use it in the group. So if I go and do a group, and as we remove the
condition over here. So if you execute it, you're
going to get the same error, you are not allowed to use the window function
in the group. So only with the derby or as
well in the select clause. Okay, now to the second rule, you cannot use window functions inside another window function. So that means you cannot go and list window functions together. Let me show you what
I mean with that. So let's remove the group Pi. Now, everything
should be working. Let's stick and copy the whole
window function over here, and let's just st it. Instead of sales, we're
going to have now window function inside
another window function. As you can see, this is
the inner window function, and the rest the outside is
the outside window function. If I go and execute this, you'll see that the
scale going to tell us, you cannot use the
window function in the context of another
window function. So we cannot do sting
using window functions. As you can see, this
is another limitation for those functions. All right. Moving to the third rule
or let's say an info, the window function
can be executed after filtering the data
with the were clause. Let's have an example.
Now, let's say that, I would like to have
the same information. The total sales for each status, but only for two
products, 101 and 102. Let's go and do that. We're going to use
the were clause, and then we're going
to say product ID in. We're going to
specify 101 and 102. Let's go and execute this. Now, we can see we still
have two partitions. One for the delivered
and one for the ship, but the total sales is
reduced because we are only focusing on two products and we filtered the whole data sets. So how scale works, First, the work clause is
going to be executed, and then the window function
is going to be calculated. That means first filtering
and then aggregations. Okay, guys, now we're
going to move to the last rule to the
most interesting one, and it says the following. You are allowed to use the
window function together with the groupi clause only if
you use the same columns. So let me explain
what do I mean, but first, some coffee. Let's have the following
task, and it says, rank the customers based
on their total sales. Now, it sounds really easy, but if you check it, you
have here two calculations. The first one, you have
to rank the customers, and the second calculation
is an aggregation. You have to find the total
sales for each customers. So I'm going to show
you step by step how I usually solve those tasks. Now, let's check
the total sales. It is an aggregation, right, so we can use the SM function, and this function
is available in both group i and as well
in the window function. So for now, I'm going
to go with the group i, and that's because the
task is very simple. We don't have to show any
other details, right. So it's all about aggregations. So why not using the groupi. Now to the first part where we have to rank the customers, we cannot use the rank function
with the groupi, right? Group Pi uses only aggregations. So here we are forced to
use the window function. So that means for the rank, I'm going to use window
function for the total sales. I'm going to use a group i. So now let's do it step by step. So first, we have to find the total sales for each
customer using group? It's very simple. So
I'm just going to remove all those stuff in
our select statements. We need the customer ID, and then we don't need a
window function over here. And then after that f, we're going to have
a group customer ID. So now I'm just
grouping the customers and finding the
sum of all sales. Let's go and execute this. So now we're going to
see in the results, we have four customers, and that's why we
have four rows, and as well we have
the total sales. So let's say the half of the task is already
solved right. Now, what is missing
that We need to rank. So let's go and build that. The second step, we're going
to use the rank function, and we can define a
window for that, over. And inside it will not
partition the data at all because it's
already grouped up. So what we're going
to do? Over order by. The rank function always needs an order by,
don't talk about it. We can talk about it
later. So now we are ranking the data based
on the total sales. That means the sum of sales. So what we're going to do,
let's just go and copy this and put it
after the order by. And now we have to decide whether ascending or descending. It's going to be descending
so the highest sales first and then the lowest sales. So now, as you can see,
we have now a rank. Customers, and we have a window function now
together with the group Pi. Now let's go and excuse this and see whether Q
going to allow it. Let's run it and as you can see, qu runs it, and we will get
the rank for each customers. The customer three has
the highest total sale, then the customer number one
and the last one going to be customer number two with
the lowest total sales. All right, we solve
the tasks we have now ranked the customers based
on their total sales. So as you can see,
SQL allows you to use window function
together with the group, but only with one rule. Anything that you
are using inside the window function should
be part of the group Pi. For example, we fulfill the rule because we are
using the sum of sales, and the sum of sales
is part of the group. If I go, I just
spreak the rule by nuts using the sum,
just using the sales. So if I just remove the sum
and use only the sales, k will not allow it because the sales is not part
of the group Pi. As you can see, k is
very strict with this. If you want to use everything in one query without using
sub queries and so on, you have to use
exact same columns. For example, if I go over
here instead of sales, I use the customer ID. Since the customer
ID is a part of the group, scale can allows it. So be careful using window function together
with the group Pi, as long as you are
using the same columns, nothing going to go wrong,
and qual allows it. Okay, now, I'm just
going to go and fix this let's run it. Now as you can see, it's really easy if you follow those steps. First, build the
query using group Pi. Don't you think about
the window function. Just build the group Pi,
and then the next step, the last one, you go and define and build the
window function. With that, you can
solve really nice analytical use cases with a simple one query
without having you to build like sub
queries and so on, you can go and use group Pi together with the
window functions. All right, guys. So those are the four rules for the
SQL window functions.
220. 3 7 window summary: All right, friends.
So now let's have a quick recap about the
scale window functions. Let's start with the definition. We're going to go and perform calculations like aggregations on top of subset of data without losing the
level of details. So that means we can do aggregations and
at the same time, we are not losing the details. Now, of course,
there is a lot of similarity between the window
function and the group I. But the main difference is that window functions are very
powerful and dynamic. Compared to the groupi. We have way more
functions than the group. But now if you are
doing data analysis and you have an
advanced use case, then you have to go and
use window function. It's more suitable for complex and advanced
data analysis. But in the other hand, if
you have a simple question, simple data analyses,
then you can go and use the aggregate
functions using the group Pi. Of course, you can
go and use them in the same query. In
the same select. You can go and mix
the group Pi together with the window function
with only one rule, you have to use
the same columns. Of course, the first step
is to do the group and then later you do the window
function in the same query. Now to the next point about
the window components, we have two main components. The first one is the
window function and the second part is the window definition
using the over clause. Inside the over clause, we
can define three things. If you want to divide the
data to create windows, you can use the partition by. The second section
we have the order by in order to sort your data, and the last part, you
can go and specify a subset of data like a
frame within each window. Now let's move to the last part. We have rules for the
SQL window functions. The first thing is
that if you have two window functions or
multiple window functions, you cannot go and
nest them together. You have to go and use
multiple subqueries. The next point is
that you can use the window function only in the select and the
order by clause. For example, you cannot use
the window together with the ware clause in order
to filter the data. Talking about filtering data, how SQL going to go and
execute the window function. It's always after
SQL filter the data. All right. Those
are the basic stuff about the SQL window function. Alright, so with that
you have covered the basics about the
Scale window functions. What is window functions? Why do we need the syntax,
the main components. Now moving on to the next one, we're going to learn
how to aggregate your data using the window
aggregate functions. Here we have five functions, so we can understand
the synax how it works, the use cases, and everything.
221. 4 1 win aggr what is: Hey, friends, so we're going to learn now
how to aggregate your data using five different window
aggregate functions. We have to count sum
average min max. And as usual, first, we have to understand
the concept behind them. After that, we're going
to talk about the syntax, and we're going to cover the
most important use cases that I collected from
my real life projects. So now, first, let's understand why they call them
aggregate functions. So let's go. Okay, guys. Let's say that in our data, we have the following
informations. We have the months
and the sales. Now, if you apply any
aggregate functions in SQL, what can happen, sql going to go through
all rows of the window or the entire data and
start aggregating the data. That means in the
result in the output, SQL going to give you one
single aggregated value. Q going to go and summarize
all those values, and in the output, you're going to find, for
example, here, the total sales are
going to be 175, or you can use the average
or count the data and so on. So the aggregate functions
going to deliver at the end one aggregated value for a
window or for the entire data. Now, let's have a
quick overview of the syntax of all
aggregate functions. Most of them follow
the same rule. First, as usual, we have to
define the function name, and in this example,
we have the average. Then to the next
part, we have to define inside it as
well, the expression. We cannot leave it empty. Here we are using the sales and the second rule for all
functions beside the count, the data type of this
field, should be a number. And this, of course,
makes sense, right? We cannot find the average of the first name of customers
or something like that. So we have to define a number. Then next we have to
define the frame. So we have the partition
pi, and it is optional. So you could use it
or leave it depends. And then the next one
we have the order pi, it is as well optional. It is not a must or required, so you could use it or leave it. That's mean the
whole definition of the window could be empty
for the aggregate functions. Let's have a look
to all functions, so we have the count
average mean max. And as you can see,
only the count accepts all data types as an
expression or arguments. All others require you to
have a number as a data type. All functions, the
partition by is optional, the same for order by and frame, so everything is
optional over here. Now, what we're going
to do with that, we're going to go and
deep dive into each of those functions in order
to understand how they work, what are the use cases, and of course, were going
to practice in scull. So we're going to
start with the first one with the function count.
222. 4 2 win aggr count: Okay, so what is account
function? It's really simple. It's going to return the number of rows within each window. It's going to help you to
understand how many rows do you have within
each subset of data. Now let's go and understand how SQL works with this function. All right, now we have again, this very simple
example for the orders, and we have the
following information. We have the products and sales. And now we want to
solve very simple task. How many orders do we have
within each products. So in order to solve it, we can use the function
count like the following. So we can say count, and then we pass for it an argument or
expression, the star. So with that we
are telling qual, go and count how many rows
do we have in our table, but we have a window
definition like this. Over partition pie products. So now what qual going
to do go to go and divide the data sets
into two partitions. We're gonna have
one partition for the caps and another
one for the gloves. So with that qual prepared
our data into windows, and we are ready to
do aggregations. So how many rows do we
have within each window? It's going to be three.
So for this window, it's going to be three rows, And as well for the next
window, we have as well, three rows, so we can have
three, three, and three. It's very simple, right, guys. We are just finding the number of rows within each window. But now with the
aggregate functions, we have to be very careful
with the null values. For the count star, as
you can see over here, we are not specifying
anything about the sales. So we are just saying,
finding the number of rows. So that means Q L will just
count the nulls as one row. So that means if we are using the star as an argument
for the function counts, The null will not
affect anything. So whether we have
nulls or nuts, we are just counting
how many rows do we have inside our data. But in some scenarios, we should be ignoring
the nulls in our count. For example, let's say that, I would like to
count how many sales do we have within each product. That means if we have nulls, it should not be counted. So now, in order to achieve this task, what are
we going to do? We can use instead
of star over here, we're going to have
the field sales. So now with this,
we are telling SQL. Don't just count blindly, how many rows do we have
within each window. You should be very
careful with the values. Find how many sales do we
have within each window. So now let's see
what can happen. For the first window,
we have three sales, so we have three values. So the number of
rows is correct. But for the next
one, how many sales do we have? We have two. So we have this sale
and then the 70, but the last one is null, so it will not be counted. It would be ignored. That's why we're going to get
in the output, the value two. We
have two sales. You can see the result
did change and we are now more sensitive
to the null values. Be careful what you are
specifying for the count? If you are using a
column name like this, it will ignore the nulls. But if you have a star,
it's just going to go and find how many rows do we
have within each partition. Now if you go and compare
the result side by side, you can see that. If you specify a column
within the count function, it's going to be
sensitive with the nulls. It's going to ignore it and will not use it within
the aggregations. That's why we have
here only two rows. But if you go and
use the star within the count function,
what's going to happen? Scale just going to
go and count it. We're going to
find the number of rows that we have
inside our table. And there is one
more way in order to do the same thing here
on the left side. You can use instead of
star, you can use one. So you might find it somewhere that people are using count one, and then the same
window function, and we will get exactly
the same results. So the nulls would be counted
and would not be ignored. So now you might ask
me, which one should I use the one or the star?
Well, I would say, It doesn't matter. We are
getting the same results. And if you are thinking
about the performance, I hardly find any
differences between them. You can go and try both
of them and stick with the one that is giving you
more better performance. Now, we have special case for the count function compared to all other
aggregate functions. It allows any data type. So that means we
can use numbers, we can use characters,
dates, and so on. That means we can go
and specify something like the products for the
count instead of sales. So we can go over here
and say products. And it's going to go and count how many rows do we
have for the products. So it's going to be
three over here. And since here, we
don't have any nulls, it's going to go and
count it like this. So we have three rows.
And be careful here, we are not counting
the unique rows. We are just counting the rows that we have inside our data. So this will not
be counted as one, and this as well
will not be one. So we have three times the caps. That's why we have here.
Three. Okay. So now we have this very
simple example. Find the total number of orders. This is very simple task. In order to find how many
rows, how many recurs, Do we have inside
the table orders. So let's go and solve it. So let's start by
selecting just star from the table orders
without anything like this. So as you can see, we have
ten orders. It's very simple. It's very easy as well. But now, let's say that you have
thousands or millions of rows. You cannot do it like this,
by just checking the rows. What you're going to do,
you're going to go and use the function count. So we can go over
here and say counts, star, and then let's give
it a name total orders. So let's go and execute it. So as you can see, we
got only one record, one value, we don't
see any other details. We got the ten orders, so this is the total
number of orders. This is very helpful in order to understand the
content of your data. This we call it
overall analysis. Or let's say having the big
numbers about your business. For example, how many
orders do we have, how many customers, products, employees, and so on. Having those big numbers can help us to track our business, to understand how well we are doing with the orders and
with the customers and so on. This is the basics of reporting. Now, let's go and extend
our task by saying, provide details such as the
order ID and the order dates. So let's go and do that. So select order ID, order dates. And now, of course, we
cannot do it like this. Let me just execute it. We will get an error
because here we have different level of
details in our select. So in order to solve this,
what we're going to do, we're going to use
the over clause, and with that we are
telling a scale. This is a window function. So now let's go and execute it. So with that you
can see with that, we have solved the task. We have details. We have
the order ID, or the dates. So this is the highest level of details, since we
have the order ID. And as well, we have the
highest level of aggregations. We have the total
number of orders, in the entire table orders. So now let's keep going and
add more staff to our task. Let's say that. We want to find the total number of orders, but for each customers. So that means this time, we have to go and
divide our data by the customers. So
let's go and do that. We can use as well, a window
function, so count star. Over, we have to divide the
data using partition by. And we're going to use
the filled customer ID. So let's call it
orders by customers. And I would like to see as well the customer informations
in the query. That's why I'm going
to go and add it. All right. So that's all.
Let's go and execute it. Now, as we learned before that, Equal first go to go
and divide the data. So that means we
have four customers. We're going to get four windows. The first window going to be for the customer ID number one. And as you can see,
we have three rows. That's why we have
here three orders. And the same thing
for the customer two, we have three orders,
customer three, three orders, but only
the last customer, the customer ID number four, we have only one row and one. So now, if you go and look to the total orders and the
orders by customers, you can see now we are not
doing the overall analysis. We are doing like comparison between different categories. And, of course, in this example, the category is the customers. And with that, we can
understand as well, the behavior of our customers. So you can see that. We
have three customers that has exactly the same
amount of orders. So they are very similar, but we have one extreme, which is the customer
ID number four. This customer has
only one order, so this is the only
customer that has different behavior than
all other customers. So you see with
very simple query, we are able now to analyze our business and understand the behavior of our customers. So if you divide the data by partition by and using counts, you can go and now
compare stuff together. All right. So now
let's keep moving. Next, we can understand
the special cases that we have the function count. So now we have this very
simple task, it says, find the total
number of customers, and additionally, we have to provide all customers details. So I think it's very easy to solve what
we're going to do? We're going to go
and select star, since we need all details from customers from sales customers. So let's just have a look. So we have five customers, and the function is
count star over. And we don't have to divide
the data since we have to find the total number of
customers for the entire table, and it's going to
be total customers. So nothing new. That's it. We have five customers. Now, as we learned before, if you are passing the star
to the count function, what you're telling
to scale that go and count how many rows do we have inside the table customers. Scale just going to go
and start counting. I' to say, we have five
customers five rows. It doesn't matter
whether we have nulls inside our data like in the
last name or the score. It's just going to count
the number of rows. Now, let's say that we
have the following task. It's going to say, find the
total number of scores for So what do we need
with this task is to find out how many
scores inside our data. So as you can see, we
have around four scores, but the last customer doesn't have any score, so
we have it as a ll. So the result should be four, we cannot go now
and use the star for it because we're
going to get five. We have to go and
count the scores. Let's see how we can do that. We can count as well. But this time, the score, and the definition of the window going to be empty. So total scores, and let's
go and execute this. So now we can see in the
results, we got four scores, which is very correct
because Equal did ignore the null and squalw focusing
only on one column. So focusing on those values, the nulls will not be counted. This is really great in order to check the quality of your data. So let's say that you are not expecting annuals
inside your data. So instead of going manually
through the whole records, what you can do, you can go and find the total number
of customers like this. And then you can go and count
the total number of scores, and you can see there
is a difference. So by just checking the data, I can say, You know what? We have one null without checking every
record in our data. With that, we can
check the quality of our data and understand
very quickly, how many nulls do we
have in the field score, and you can do the same stuff, for example, for the first name. Show it to you.
I'm just going to go and copy this
let's say first name. Let's say country, actually. So I will go with the country. So let's go with the
country total countries. So let's go and execute this. Now if you check the result, you can see we have five
rows with the country. Scale going to go and focus on the countries and it
will not find any nulls. So we have here complete data. We don't have any nulls
because the total number of customers is equal to the total number of values
within the country. And I can immediately find the data quality of the country
is very good. All right. Now one more thing
about the count function that we
have learned before, we can use either star or one in order to count
how many rows do we have? Let's just try it. I'm just
going to go and duplicate it. And instead of having a
star, let's have one. Just going to give
it a name. Here it's going to be one and you are star. So let's go
and execute it. So if you check the output, we got exactly
identical results. So there is no difference
between those two queries. It's up to you, you can try
it and check the performance. I usually go with the
star instead of one. Okay, now we're
going to talk about a very important use case for the SQL window function count that I frequently use
in my real projects. The data that we use
for data analysis has usually bad data quality. And if we don't find those
data quality issues and we don't clean it before
doing the analysis, what's going to happen
are we going to deliver bad results,
bad analyses, which can lead to bad decisions. One very common data quality issue that you
might encounter in your project or on your data
is that's having duplicates. Duplicates are really bad
for doing data analysis. So now, in order to
discover or let's say, identify the Dublicate
in our data, we can go and use the qual
window function count. So now let's go and
have some examples. So now the task says, Check whether the
table orders contains any duplicate rows. So how
are we going to do that? By checking out the
table orders over here, we can see that there
are many orders, but how to find out
the Duplicates? Well, the first step is to understand what is the primary
key of the table orders. So what we usually do we go and check the data model
if there is one. So, for example,
for this course, we have the following
data model, and we can see that
it is defined that the order ID is the primary
key for the orders. The product ID is primary
key for the products. So that means for our
table, the orders, we have the order ID
as the primary key, and it should be unique. It should not contain
any double kids. Now let's go to our data. And check the other ID, by just looking at the data, you can see that we don't have any duplicates, all
of them are unique. So we have one, two,
three, four, and so on. But of course, in real projects, you cannot do it like this, you have to go and
build query in order to find out whether the
primary key is unique. But now we might say
the primary keys are usually unique because
we can define it in the DDL in the rules of building the table.
Well, that's true. If you have it like
this, then you don't have to find any Dublicate. But usually in data analysis, we export a lot of
files and a lot of data inside an extra database, and we don't build such a rules. Now in order to
check the quality of the primary keys that
you get from the source, We can use the count function. So let's go and build it. I'm just going to select the
order ID first as a detail, and now we're going
to do the following. So count and then star, and let's go and
define the window. So it's going to
be partition by, and here the field is going
to be the primary key. So the order ID. I'm checking now the
quality of this field. This should not
contain any doubles. And now we're going
to go and give it a name check primary key. So now my expectation that's The result of this should
be at maximum one. That means we have one
row for each primary key, and that means as
well, it is unique. If you've got anything
more than one, then it means we
have doublicates. Let's go and run the query. As you can see in the results, we get for each primary
key one. That's great. That means we don't have
any Dublicates inside of our data and the
primary key is unique. So that means the
table orders is clean and we don't have
any duplicates inside it. Now, let's check our database. We have here another table
called Orders Archive. Let's go and check the table. First, I'm just going to
go and select the data, select from orders Archive. Sales tots orders archive. Let's check the results. And here we can see that we have exactly the same structure
as the table orders. Now let's go and check whether
the data quality is well. So now what we're going
to do? We're going to use exactly the same
query as before. But instead of using
the table orders, we're going to take the
orders archive. That's it. Let's go and execute it. Now by checking the data, you can see that we don't
have everywhere one. Sometimes we have two rows for the same primary key,
which is really bad. So we have here for
the order ID four. We have two orders with
the same order ID. As well, for this order ID six, we have three orders. That means those staff are Dublicds and they are
a gist our data model. Now what else we can do with
that to generate a list specifically for the
data quality issue where we have duplicates. Anything that has one, we are not interested on it. In order to do that, we're
going to use the sub query. Let's say, select star from, and then we can use the
first query as a sub query. And we're going to say
in our filter where the check primary key
is higher than one. That means I need
only the order IDs where we have doublecates. Let's go and execute this. Now, we have a list with the primary keys where
we have Dubliate. We have the order ID four, and as well the order ID six. Guys, as you can see, the
window count function is wonderful in order to find data quality issues like the Dubliates.
All right, guys. Those are the four most
important use cases in the Cal window
function count. The first one we
can use it in order to do overall analyzes, or we can use it in order to do category analyses like we have done the analysis on
the customer behavior, Or another use case,
we can use it in order to check the nulls
inside our data. And the last use case, we can use it in order to identify or discover the data quality
issue duplicates in our data. Now let's go and check
the next function. We have the sum.
223. 4 3 win aggr sum: All right. So now
let's understand what is the sum function.
It's very simple. It's going to return the sum of all values
within each window. So now let's go and understand how SQL works with
this function. All right, so this is very easy, and we are using the
same simple example. And now we would like to find the total sales
for each products. So we can define like
this sum of sales, since we are finding
the total sales, and then we define
the window like this over partition by products. So as we learned, SQL
is going to go first and divide our data
into two windows. So one window for the caps, Another window for
the gloves, right? So now after Q
defined the windows, it's going to go and start
aggregating the data. So the sum of sales. That means, for
the first window, we have the three sales, and it's going to
go and just simply summarize all those values. So we are adding 20
plus ten plus five, and we will get the result 35. In the outputs, we
will get everywhere. 35. So that's it for
the first window. And as you can see, SQL
going to go aggregate the data within each
window separately. So that means as we are aggregating the
data for the caps, SQL will not check
anything with the gloves, so they are
completely separated. So now it's going to go
for the next window, and here we have two
values and null. So again, here, the null
will just be ignored. So what we're going to
have, we're going to have 30 plus 70, and the total sales for
that is going to be 100. So as you can see, it
is very simple, right. 100, 100 and guys, that's it. It's really simple. We
don't have here like a lot of special cases like
the count function. It's only that it ignores the null in the
calculation, and as well, the requirement here, it allows only integers or
let's say numbers. So we cannot go and say some of the products since the products are not numbers,
they are characters. So you can only use numbers
for the sum function. Let's go now and
have some tasks and some use cases in order
to practice in scale. Find the total sales
across all order. As we'll find the total
sales for each product. Additionally, we have to provide some details like the order
ID and the order dates. Let's go and do that, select
order ID, order dates. Let's get as well the sales, and now we have
to find the total sales across all orders. That means we can use the
window function sum sales and the definition of the
window going to be empty since we don't
have to divide the data. That's its total sales. And we have to select
the table, sales orders. So that sets, let's
go and execute it. So with that as you
can see, we got all the details that
we need, and as well, the total sales,
the summarization of all those sales in one field. With that we have
our overall analyses one big number for
our reporting. We know how much sales we did made in the entire business. Now let's go for the next task. It says, total sales
for each product. I think you know already
what we're going to do. Sum of sales, s, we're going to do it
like this, partition by. Product ID. So that sense, we're going to call
it sales by products. With that, we are dividing
the data by the product. So let's go and execute it. As you can see, we don't have
the product information, let's go and add the product ID in the query just in order
to analyze the results. We can see from the data that the winner is the
product ID 101. As you can see, we have
here the highest sales. If you compare it with
the other products, and the lowest one going
to be the products ID 105. So as you can see, we can use the window function
sum together with the partition by in order to compare stuff to
do comparison between the products in order to
understand the performance, for example, of the products. So it's really great analysis
for the performance. Alright, now we're going to move to very interesting use case for the aggregate functions, not only for the sum, but
as well for the others, it is the comparison analysis. Okay, so let's
understand quickly, what is the
comparison use cases. It's going to go and
compare the current value. For example, let's say we are currently at the month of March, and the sales is 30. We're going to
compare this value, the current sales with
an aggregated value. For example, let's say, the total sales using
the sum function. What happened if you compare the current value with
the total sales, you are comparing here
or doing analysis cold Part to whole analysis, where it can help us to
understand how important was the sales in this month
compared to the total sales. Or we can go and compare it to the best months to
the highest value. For example, the
highest value is June, and we can go and
compare this month with the best months of the year or to the lowest
month in the year. Or we can go and
compare the sales of the current month with
the average in order to understand are we above the typical sales or
below the average? And this is very important
analysis in order to study and understand the
performance of the current data. Let's have an example in order to understand the use case. Find the percentage contribution of each product sales
to the total sales. Let's go and solve
it step by step. What we're going to do,
we're going to go and let's select the order ID as well, let's take the product ID and the sales just like
this from sales orders. Let's go and execute it. Now as you can see
in the results, we got the first part
of the equation. We have the sales, so nothing
like a crazy over here. Now, we need the total sales of all data. What
we're going to do? We're going to have
the sum of sales. And the definition
going to be empty. This is the total sales. Let's go and execute it. Now we have everything
for the equation. We have the sales and as
well, the total sales, and that is enough in order to find the percentage
of the contribution. The calculation for that is
going to be very simple. We're going to divide the
sales by the total sales. It's really simple.
Let's go and do that. It's going to be the sales
divided by the total sales. So we're going to go and copy the whole window
function over here, and then we're going to
multiply it with 100. That's it. Let's
go and execute it. Now you notice that's in
the output, we got zero. This is because
of the data type. So now, if we go to our table
over here on the left side, you can see that the orders
has the data type of integer. So if you divide integers, you will not get a float
or decimal number, you have to go and
change the data type. So now what we're going to
do, we're going to go and change the data type
for one of them, so it's enough for
the sales over here. So we're going to use the
following statements. So cast sales as
floats. So that's it. I'm just converting
the integer to floats. So that's it, let me
just give it a name, so it's going to be
percentage of total. So that sets. Let's go and execute. Now in the output, you
can see, we got now the percentage of the total or let's say percentage
of contribution. Now what we're going to do
with that, we're going to go around those numbers because
we have a lot of decimals. In order to do that,
we're going to use the round function like this. Then we're going to
have two decimals, and let's go and execute it. So as you can see, it is
really easier to read. Because we have
only two decimals and we can find immediately that the order eight is the highest
contributor to the total. This is what we call part to whole analysis where we find
the percentage of total. It is very common analysis
in order to understand the performance of each
order compared to the total. This is an example of how the window function
is helping us here to compare
the current value with an aggregated value. All right. So that's all for
the window function sum. Next, we're going to talk
about the average function.
224. 4 4 win aggr avg: All right. So now let's
understand what is an average function
as the name says. It's going to find the average of values within each window. So now let's go
and understand how SQL works with the
average. All right. So now pack to our
very simple example, and the task says, find the average sales for each
products. So it's really easy. We can use the average
then pass to it, the column sales, and we define the window like this
partition by products. So the first thing that qual going to go is to
define the window, so it can divide our data. Into two partitions, one for the caps and one for the gloves. Now I hope that everyone knows how to calculate the average. So as you know, it's
going to go and summarize all the values and divide
it by the number of rows. So it's going to go
and summarize 20 plus ten plus five and divide
it on three rows, and the output going to be 11. So we're going to
get it for each row. As you can see, QL just ignored everything
in the next window. We are focusing
only on the caps. Now, is going to go to the second window and start
doing the same aggregations. But here we have the
special case of null. So the null is going to be
ignored in the calculations, and we're going to
have it like this. It's going to say,
You know what? 30 plus 70, and we are
just including two rows, so it's going to
be divided by two, and the average going to be 50. So we will get the
result 50 for each row, and we are completely
ignoring the null. But now we might be
in scenario where your users understand
a business like this. If we find a null in the
sales, it means a zero, so there is no sales, and
it is actually a zero, but we store it in the
database as a null. That means the
average that you have provided is not really correct. We have to divide by three. That means first we have
to handle the nulls before doing the aggregations
before finding the average. Now, we're going to have
a whole chapter on how to handle nulls in squal what
are the different functions. But for now, we're going to
go with the functions. K. Now what we're going
to do we will not use the sales as it is, first, we're going
to handle the nulls. That means we're going to use the alisk sales and
replace it with zeros. So as you can see, we are not using
immediately the sales, we are handling it first, and then we're going
to find the average. Qual going to go over here, and if it finds any null, going to go and
replace it with zero, and that's going to
have then an effect on our average over here. It's going to be 30
plus seven plus 70, but now plus zero. Now we have three rows,
instead of dividing by two, it's going to go and
divide it by three, and the total result going
to be like this, 33. So that means we can have in
the output 33 for each row. And with that, we are now fulfilling the expectation
from the business. If you have a null, it
can be handled as zero, and the result can be more
accurate. You see, right? It is very tricky.
If you are doing that analysis and aggregations, be very careful with the nulls. Understand them, understand what they mean for the business, handle them correctly
in order to get correct results
in your analysis. Now, let's go back in
order to practice SQL, using some tasks and use cases. Okay, so let's start
with the basics. We have the following task. Find the average sales
across all orders. As we'll find the
average sales for each product and don't
forget the details. Now let's go and solve
it step by step, so select order ID order date. Let's get the sales as well. Let's go and find
the average sales. It's going to be
a window function and we have the sales inside it, the usual stuff, that
window going to be empty. Average sales, we're
going to call it. That table going to
be sales orders. So that sets, let's
go and execute it. Oh, we have to select
everything, of course. What Equal did in the output, we're going to go and summarize all those values and
then divide it by ten. With that, we have the
average sales of 38. Very easy. This is, again, what we call and
overall analysis. Let's move to the next one, find the average sales
for each products. Again, we're going
to go and build the window function like
this, average sales, and we can divide
it by product ID, and we're going to call it
average sales by products. And we're going
to go and add the product ID in the query. The outset, let's go and execute and we missed
something here. It is the partition by
going to execute again. With that, we have
the following data. So with going to go
and divide the data. For example, for these products, we have those four orders, what can happen is going
to go and summarize the four values and
then divide it by four. That's why we have here 35. The same thing for
the next orders going to divide it by three. The last one is just going
to divide it by one. That's why we have 60. As you can see,
aggregation going to done separately for each window, and this is very
nice way in order to compare the averages between
the different products. Now let's have an example in order to learn how to
deal with the nulls. Let's say that we have
the following task. Find the average scores
of customers and show as well
additional information like the customer ID
and the last name. Let's go and solve this. We are now targeting
the table customers. Let's just select it first. Like this. And now
let's go and include the customer ID
and the last name. Let's have as well the score. But this time, we're
going to go and find the average score. So it's going to be
the average score. And since we don't
partition the data, we're going to leave
the definition like this and going to be
the average score. So that set let's
go and execute it. Now as you can see we have
the average score of 625. Q going to go and summarize the four values and
divided by four. But here we have a null. Now we have to understand the
business or ask about it, what the null means in the
scores of the customers. Is it zero or is it
something empty? If it's zero, then the
average that we have is wrong because it should be divided
by five and not four. Let's say it's zero. That means we have to go
and handle the nulls. What we're going to do
now, we're going to go and use the function is. Quals earns for the score and
replace the null with zero. You are the customer score. Let's go and execute this. So you can see if
there is a value, it's going to be
exactly the same value, but only if you have a null, it's going to be
replaced with zero. Now let's go and
correct the average. I'm just going to do
it like this. Let's go and copy the whole thing. But now instead of
using the score, we're going to use the score
that is handled with nulls. I'm just going to go and
replace it like this. Here without nulls. Let's go and execute it. As you can see, we are getting more valid result at the output compared to the previous one, and this is only for the
case if the null means zero. Guys, as you see, be very
careful with the nulls, especially if you are doing
aggregations and handle it correctly before doing any aggregations
like the average. Moving on to the asuse case, we have the comparison
analysis and the task says, find all orders
where the sales are higher than the average
sales across all orders. That means we have
to go and compare the current sales with the aggregated value at this
time, the average of sales. Now let's go and do it step by step. What we're going to do? We're going to go and
select, of course. The order ID, what do we need? Let's take the product ID, and we need the current sales. It's going to be the sales
as it is. That's it for now. So from sales orders,
So that sets. Let's go and execute it. So by checking the result, you can see that we
got the first part of the equation, right. We have the sales
for each order. Now, we need the second part. The average sales
across all orders. In order to do that,
we're going to go and use the window function
average sales, and we're going to use over. Since across all orders, that means it's
going to be empty. So let's give it a
name average sales. So let's go ahead
and execute it. Now in the output, we
got the averse sales, so it can be 38. Now we need all the orders that are higher
than the average. As you can see, for example, the order one is not higher, but the order four is
higher than the average. In order to filter the data, we cannot use the window
function in the wear clothes. What we're going to do, sadly, we're going to go and
use the sub query. It's going to be like
this. Select star from and then we're going to define the condition outside
the subquery. It's going to be where the sales is higher than the
average sales. That's. Let's go and execute it. Now as you can see,
it's very simple. We got all the orders that
are higher than the average. You can see all those sales
are higher than the average. It would be nice if we can do all those stuff in
the first query. But since we cannot do that. We need to use the subqueries in order to filter the data. Afterward. That
we can understand the importance of the
comparison analysis. For example, here, we are
finding or evaluating the data whether they are above the average or
below the average, and this is very important
in the business analysis. All right, everyone. That's all for the window function average. Next, we're going to talk about two very
interesting functions, the min and max.
225. 4 5 win aggr min max: All right, guys. So what
is mean and max functions? They are very simple, but yet, very powerful functions
for analytics. The mean simply is the function
that's going to return the minimum or let's say the lowest value
within a window, where the max, it's
exactly the opposite. It's going to find
the maximum value or the highest value
within a window. Now let's go and understand how SQL works with these
functions. All right. So now we have the same data, and we have two tasks. First, we have to find the lowest seals
for each products. The second one side by side, we would like to
find the highest seals for each products. So we're going to go
and use the men max. And as you can see the
syntax is very simple. Man the seals, and then the partition going to
be by the products, and as well, the same
stuff, but having the max. Okay. So now let's see how qual going to execute
the first query. As usual, first, it's
going to prepare the data. So it's going to split the
data into two windows, one for the caps and
another one for the gloves. And after that, it's
going to search for the lowest sales within
each window separately. So for the first window, we
have the following values, 20, ten, and five. And of course, the lowest
value going to be the five. So that's why qual going
to find it over here, and everywhere for this window, it's going to be the value five. So we have it as the lowest
sales for the product caps. So now we're going to jump
to the next window for the gloves and start
searching the values. So as you can see, we
have 30, 70 and null. Null will be ignored,
so Null will not be considered as
the lowest value. So que going to find the
lowest sales with the 30. So it's going to be actually
the first row within this window and the value output gonna be 30 for each row. So that's it is
very simple right. Now, let's move to the next one. We have the same
stuff, but using Max, so the data is partitions. And for the first partition, what is the highest value? It's going to be the
first row, the 20. So Esq go to find it. And in the output, we will
get the highest sales, 20 for this window. Then it's going to go
to the second window and search for the
highest value. So here we have
two values, 3070, and it's going to
be the 70, right? So it's going to
point it over here. And in the output, we
will get everywhere. 70. So, guys, it's
really simple right. Now, let's back to our
scenario in the average, where in our business,
we understand nulls as zero in the sales. So that means first we have to handle the nulls and
replace it with zero, and then we're going to go
and search for the value. So what's going to happen? We're going to go and
replace nulls with zero. For the max, nothing
going to change. The highest value
going to be 70, and we're going to
get the same output. But for the min, now we
have new lowest value. So it's not anymore the 30. It's actually the zero. So q can go over here and
replace the 30 with nulls. Nulls is the lowest sales
for the product gloves. Again, guys, the nulls
are very tricky and those functions are really
sensitive with the nulls. Understand what the nulls
means and handle it correctly so that you get
correct results in the output. That says, Let's go back
to quel to have some tasks and use cases in order
to practice qual. All right, everyone, let's
start with the basic stuff. Find the highest and lowest
sales of all orders, and we'll find the highest and lowest sales for each product, and we have to provide
additional information. So let's go and solvet select
order ID, or the lats. And let's take as
well the product ID. Now, let's find the highest
sales of all orders. It's going to be the max
function for the sales and the window function
is going to be empty sales of all orders. So you are the highest sales. Let's go for the lowest
sales of all orders. I go to be exactly the
opposite the main function for sales over Then we
have the lowest sales. So I'm just going to
make it bigger capital. So it's leak the table. Sales orders. I think that sets. Let's have as well
the sales, actually. All right. So now let's
go and excuse it. Now this is very simple, right? This is the whole sales. What is the highest sales? We have the 90 of
the order eight. As you can see, we have now
the highest sales, the 90, and the sales is the ten, the first order is the
lowest. It's very easy. Now we're going to go and repeat the same stuff for the products. So we have go and partition
the data by the product ID. What I'm going to do,
I'm just going to go and copy based stuff around. The first one is going
to be partition. The product ID. So highest
sales by products. And the next one is going to be the same stuff copy
paste by the products. So that sits. Let's go and execute it. S again. The data going to be partitioned and divided
by the product. So for the first window, what is the highest sales? It's going to be the 90, and the lowest sales is
going to be the ten. So it's exactly like
the overall right. Now, let's go to the
second window over here. We can see that the lowest or the highest sales is the 60, the first one, and the
lowest this time is 15. And this is great in order
to see that The que can execute each of those functions for each window separately. So let's go to the
last window. It's 41. So the sales is 60, and we have only one row. So it's going to be the highest and as well, the lowest sales. So with that, as you can see, we can define a range
for each product, and the range are different from each product
to another one. For example, for
this product 101, the range from ten until 90. But for the second
product, we have it 15-60 Okay, guys, let's
move to the next one, which is one of my favorites
in the window function where we filter the data
using the minimax functions. Let's have the following task. It says, show the employees who have the highest salaries. This sounds very simple, but we can use the help of window functions in
order to solve it. So now we are working
with the table employees. Let's just select the data. Select from sales. Employees. That sets. Let's go and execute it. Now we have five employees and we have those
different salaries. Let's go and find
the highest salary. Max salary. Let's use the window
function over, but we don't partition
the data at all. So it's going to be like
this. Highest salary. Let's go and execute it. Now by checking the results, we got a new column called
highest salary, and inside it, we have the 90 k. If you
check those five salaries, you can see that the highest is from the employee, Michael. But still the task
is not solved, we have to show
only the employees who have the highest series. We have somehow to
filter the data and only show this employee. In order to do that,
we have to use the sub queries since we cannot use the window
function in the ware clause. What we're going to
do select star from, and then our first query
going to be the inner query. So we have the
following condition. It's going to be
the salary should be equal to the highest
salary. So it's very simple. So with that we are comparing the salaries with the
highest salaries, if there is a match, the
data going to be presented. So let's go and execute that. And that's it, as you can see, we got the employee with
the highest salary. But if they are
multiple employees with the same salary of 90 k, of course, we're going to
get it in their results. I think Michael going
to need a new job, right. This is the worst. So this is another use case for the window functions
Min max. All right. So now we come to the use case of the comparison analysis, where we want to compare
the current sales with the highest and
the lowest value. So we have the following task. It says, find the deviation of each sales from the minimum
and the maximum sales amount. So as you can see,
this is our sales, this is the highest and
this is the lowest. So now we just have to go
and subtract the data from each others in order to get the deviation. So
it's very simple. Let's get the first deviation, where we're going to go
and subtract the sales. With the lowest value. So it's going to be like this. So now what we are
doing over here, we are subtracting
the sales from the lowest sales of all records. So we're going to go and
call you deviation from me. Let's go and execute it. So now we can see
from those values, how far is the current
value from the extreme. The extreme here is
the lowest value. So this is really great way to analyze the extremes
in your data. Now as we are near
to the extreme, the value going to be low. So as you can see
here we have a zero. This is the lowest
because we have it exactly as the extreme. Actually, this is our
value. So the ten. The next one is a little bit
far away from the extreme, which is 15, so we have
it here as a five. This is not far away
from our extreme value. And then if you check this value over here, we have it 80. The distance is very far away from our extreme
value, the lowest sales. This is really nice
analysis in order to analyze and evaluate
the sales of your data. Now, of course, we
can go and evaluate our data with another extreme, which is the highest sales. In order to do that,
we're going to first say, let's get the highest,
sorry this one, the highest sales and
subtracted from the sales. You are the deviation.
From the max. Let's go and execute it. Now we can see in the output, we're going to get exactly
the opposite distances. The order number one is the
farest from the extreme. As you can see, we
have the value of 80 and the order eight
is the identical one, so that's why we have
the distance of zero. Now we can see as
well very quickly, which data points are the nearest to the extreme
to the highest sales. As you can see guys using the window function
mean and max, it is very powerful in
order to understand and evaluate your data
points to the extreme.
226. 4 6 win aggr rolling running: All right, ever. So now we can focus on very
important use case. One of the must know use cases
for that aggregations is doing running total
and rolling total. These two concepts are
very important for that analysis and doing
reporting that you must know. The key use case for those two concepts
is to do tracking. For example, we can go and track the current total sales with the target sales
in our business, and as well, it's
great in order to do historical analysis
for the trends. Okay, now the question is, what is running
and rolling total? They are basically very similar. They're going to go and
aggregate a sequence of members. The aggregation going
to get updated each time we add a new
member to the sequence. A sequence could be
like a time sequence. That's why we call this type
and analysis over time. So now we still
have the question, what is the difference between the running and the
rolling totals? The running total
can go and aggregate everything from the beginning until the current data point, without dropping
off any old data. We, on the other hand
in the rolling total, it's going to go and focus on a specific time window like the last 30 days or
the last two months. And each time we add a new member or a new
data point to the window, we will be dropping off the oldest data
point in the window. And with this, we're
going to get the effect of rolling or let's
say, shifting window. Okay, I totally understand if
this might be complicated. Now, let's go and have
very simple example in order to understand
this concept, and as well, how we can solve it using qual.
All right, guys. So now we have very
simple example. We have the months and sales, and we have it twice because
I want to show you side by side how Squal works with the running total and
the rolling total. So now, what is the
task on the left side? We want to find
the running total of sales for each month. And on the right side, we would like to find
three month rolling total of the sales for each month. They sound very similar,
but on the right side, we have only fixed window. Now, how we can solve
this using SQL? On the left side, we
can use sum of sales, so we want to go and aggregate all the sales using
the sum function, and the definition for
the window going to be like this order by month. Of course, you can
go and do anything like you can have
here an average, and if you use an
average with order by, you will get the
running average or the running max or the
running count and so on. That means always
if you go and mix an aggregate function
together with an order by, you will generate an
effect of running total. Now, on the right side, we
can have the same stuff. We can have an aggregate
function together with order by, sum of sales order by month. So far, we have everything
like the left side right. But now you might ask,
why is going to go and generate this effect,
the running total? We didn't specify
crazy stuff right. It's all about the definition
of the frame clause. So now, do you remember
if you use an order by and you don't
specify a frame clause, you will get hidden or let's
say default frame clause, and it's going to
look like this, rose between unbounded
preceding and current row. And what was the definition
of the running total? It's going to go and aggregate all the data from the
very first beginning, Well, the unbounded proceeding until the current position, the current trow without
dropping off any odd members. So that means the definition of the running total going to be the exact definition of
the default frame clause. That's why equal go to go and generate the effect
of the running total. Now, let's go to the right
side, the rolling total. Here again, we have
the same stuff right. We're going to go and aggregate the data using the SM function, and we're going to go and
store the data order by month. So with that we are as well generating the effect
of running total. So each time you use
order by with aggregate So now in the running total, we want always to
specify a frame. Here in this example,
three months. That means if we are
getting a new month, we don't want to include
the latest months. We want always to
be fixed window. Now, in order to have
this fixed window effect, we have to go and redefine
the frame clause. Because if you leave it as a default like the running total, the frame is going
to keep extending. You will see this
effect in the example. Now we defined like this. Rows between two preceding
and current row. The total number of rows
going to be included in each window, going
to be maximum of Three months. So now I know you might say para what
you're talking about. You didn't get anything.
It's totally normal. You will understand it
only with an example. So in order to do this, let's start with the left side. So first, Qu going to
go and sort the data, so everything is sorted from the smallest month
until the highest one. So from January until
July, everything is good. And now su going to go and
start working with the frame. So the frame says
unbounded preceding. So this is going to be static. It's going to be always
pointing to January. This is the unbounded
proceeding, the first row in the dataset. And now, of course, we are
starting from top to bottom. The current row going to be
pointing as well to January. So the frame going
to look like this. It's going to be only one row, and the total sale of
this row going to be 20. That's why we can have
in the output 20. So now let's move
to the right side, the current row
gonna be January, and what is the two proceeding? We don't have it yet,
so it's going to be pointing maybe somewhere
here before the table. So again, what is the frame? It's going to be
as well, one row. So in the output, we will get
exactly the same result 20. So so far, there's no
differences between the running total and the
rolling total. But let's Now we're going to go to
the next row over here, what can happen to our frame. It's going to go
and extend right, so we're going to have now
two months in this frame. And what is the total
sales over here, it's going to be 30.
We added a new member. You can calculate it like this, either go and calculate all
the cells within the frame, or you can go and say this is the previous aggregated
value plus the new member. The previous one is 20, the new member is
ten, we will get 30. Both of them is correct. Now let's move to
the right side. What's going to happen, we're going to be as
well at February. The tube preceding is still
pointing somewhere outside. And here, the window
going to go and extend like this.
We have two months. And the same aggregation
gonna happen. So we have 30. So so far,
nothing crazy, right? Let's go to the
next month March. The frame going to be extended. So we have now three months. And the aggregation
going to be either here, 60 or 30 plus 30, we will get the
running total of 60. And now on the right side,
what's going to happen, were going to point
as well to March, and this time, the two preceding going to be pointing to January. And this is the
first time we are getting the whole
fixed frame, right? So we have here three
months in this frame. So what is the total of
that it's going to be 60. Okay, so now you
say, we're still getting the same result,
so there's no difference. I'm going to say wait for it. It's going to be the next one. So as we go to April, the effect here that
the frame going to get extended to
four months because always we start from
the first month until the current month without
dropping any member outside. So what is the total of this? It's going to be 65. Sorry? Now on the right side
what's going to happen, we're going to go and add
a new member, the April. But we are at the maximum
sides of the window. We have only three,
and that's because the two preceding going to shift
as well down over here. So the boundary going to be
from February until April. And with that, we are
dropping off January. And now you can see the effect. It is sliding. It is rolling or shifting
from top to bottom. And that's because the
boundaries as well shifting. So you can see now the
effect of the rolling total. The newest member
going to be added, the oldest member
is going to be out. We are allowed only to
have three muscles. So what is the total of this? It's going to be 45. So this times we are not
aggregating this value, the 60, together with the five. We are aggregating the
values within the window. So now let's keep going.
Now, we are at June, What can happen on the side, the frame going to get bigger. And with that, we will
get the result of 135. So the frame is
getting really bigger. But on the right side, it's
going to have a fixed frame. So we are just sliding,
shifting and rolling. So with that, we are
adding new member. Another member is
leaving the oldest one, and the total over
here going to be 105. And now we're going to
go to the last row. We will have everything
for the total. So the whole data set is
going to be aggregated. So this is the maximum
what we're going to get. It's going to be around 175. But on the right side,
it's just going to keep shifting until we
reach the last record, the window, the frame, going to be as well
shifting like this. So the total of
this go to be 105. Okay, guys. So you
see, it's very simple. The running total is always
consider everything from the starting position until the current row without
dropping any member. The rolling totals always drop the oldest member in order
to add something new, and the window is keep shifting. So the running total is very great in order
to do tracking, like, for example,
budget tracking. Or we check, for example, the current total sales with a target or
something like that. So always we are considering
the whole data sets. But with the rolling total, we always do here
focused analysis. We are always interested with
the window of three months. So they might sit very similar, but they have completely
different scope for analysis. But both of them are doing
aggregations over time, so they can help us to
do analysis over time, like checking
whether our business is growing over
time or declining. So, guys, as you can see,
using very simple SQLs, using the window functions, we can do really great
analysis on our data. So those staff are
really fundamental of data analysis or doing
reporting for our business. So window functions are really powerful for
data analytics.
227. 4 7 win aggr moving avg: Okay. So now we have the
following task, and it says, calculate the moving average of sales for each product
over the time. So now we have here something
called moving average. It is very similar to
the running total. In the running total we used
count and SM and so on. But here, we're going to go
and use the function average. And instead of calling
it running average, we call it moving average. So let's go and solve the task. Let's start always by
selecting the usual stuff. So let's get the order ID. Let's get the product ID. And I would say, since
it's over the time, I will get the
order date as well. And the last one, the
sales fra table sales. Orders. So that says, Let's go and execute it. So now we got our ten
orders with the products, order date, and sales. Let's start building our
window function step by step. Which function do
we need? We need the average. This
is the easiest one. It says moving average, so that says we need the sales. So it's going to be
the average of sales. Let's go and define the window. So now do we have to divide
the data partition the data? Well, yes, it says
for each product. That means we're
going to go and use the partition by clause
by the products. ID. So now I would
say that's it for the first step,
average by product. So let's go and execute it. So now if you check the result, you can see that we
got our windows. So the first one for
the product 101 and the total average of the
sales going to be 35. So we have like aggregated
one value for each window, the same thing for
the next product. And for the next and so on. So we don't have any
progress over time or something like moving
average or that time, right? We don't have this
effect. We have just one average
for each window. So now in order to have the
effect of the moving average, it's gonna be like
the running total. We have to use the
aggregate function together with the order by. So I'm just gonna make
it in the new column. I'm just going to copy
everything like here. And now we're going
to do order by. Okay. And since
it's over the time, we're going to go and use
the order date order dates, and we're going to have it as ascending because
it's over time, over time always start with the earliest dates and dub
with the latest dates. So from the lowest
to the highest. We're going to
leave it like this. Let's call it moving Average. So now let's go ahead
execute it and we got here an extra cameo
because of the copy base. So let's execute it again. All right. So now let's
check the results. Let's take the first
window over here, and you can see we have on the moving average
like progress. So it starts with
ten, 15, 14, 35. So there is moving average. We don't have one solid
number for the average, we have different values. So now, how is QL
going to solve this? It's really simple. It's
going to start row by row. So the first row, what
is the average of ten? It's going to be ten. Then
moving on to the next one. It's going to be ten plus 20, divided by two, you will get 15. So now moving to the third one. Although three values is
going to be summarized, divided by three,
you will get 40. And now to the last
row in the window, It's going to be summarizing all those four values
divided by four, and you will get 35, and this is exactly the same value in the
previous column. You have here, the average by products. We don't
have order by. You got as well, 35, exactly like the last row. That's because we have
the same calculation. It is summarizing all
those four values dividing it by four. But now, it's interesting
the next value. As you can see the next value, it comes from another window. You see here we have 15
for the product 102, But the average is
going to be as well 15. So squale is not considering the old values
from the other window. So a scale going to calculate
each window separately. So again here, this is the
first value of this window, 15, the average 15, then the same stuff right. Summarizing those values
divided by two and so on. This we call in data analysis
this last field over here, we call it a moving average, and you can implement
it very simply using an average function
together with the order by. Alright, let's move to the
next task, and it says, calculate the moving average of sales for each
product over time, including only the next order. So, as you can see, the first part we've
already done is right, we have the moving
average and divided by partition by
for the products. But here, we have
more specifications. It says, including
only the next order. That means we're talking about, the current order and as
well, the next order. So here we have a fixed
frame or fixed window. So we don't need the whole
average of the window. We need only maximum two
orders in each calculation. So how we going to
do that, we can have our custom frame clause
inside our window function. So that means we cannot
leave it as a default. We have to specify it. So let's go and do
that. I will just copy the old definition of the window because we have the exact stuff. So we have the average
sales over partition by product ID, order by date. So this is the first part. So now we would like to
have this fixed window. So we're going to go now
and define our frame close. I'm just going to zoom
out a little bit. It's going to be rows between. So we have now the
boundaries of the frame. It says, including
the next order. So we're going to go
and use the following. So the first boundary is
going to be the current row. And since it's next order,
so it's going to be one, following. So that is our frame, including only the next order, and we have it like
this. One following. Let's call it rolling average. So that's it. Let's
go and execute. So now let's go and
check the result. You can see the
moving average has completely different values
as the rolling average. So let's go and understand why. You can do it row by row. Let's take the first
row over here, so the cells here is ten. And the rolling average
is 15, why is that? Because in the calculation, we are considering
the next value. So ten plus 20/2,
you will get 15. That means the qual defined
the frame like this. Those two rows for this
calculation for the first row. Now moving on to the second row, qual going to include as well, the third one, the next one. But since the window
is only two orders, it's going to go and
drop the first row. The next frame is
going to be like this. As you can see, it's
going to be 20 plus 19/2, you will get 55. We can see the effect
of the rolling average. Now for the next one, is
going to be exact same. We are at the third row. It's going to go and
include the next one, and we're going to get
the same value because 19 plus 20 divide by
two, you will get 55. Now, interesting to the last
row in the window over here, it will not go and consider the next value because it
is outside of the window. It's going to be 20,
and it's going to stay as well, 20. That's it. Alright, guys. So with that, we have learned about
the moving average, the rolling average, and those amazing concepts
using the window function. Alright, now we can
have a quick overview of the different use cases in the aggregate functions and how the definition of the window going to change the
whole use case. So now, the first use case is
finding the overall total. And here, if you don't define
anything in the window, if you leave it empty, what can happen you are doing
here overall analyses. So you're going to go and
aggregate the whole data sets, and then provide this
aggregation for each row. This is what happened.
If you leave it empty, you don't
define anything. You are aggregating
the whole data sets. Now, moving to the next step, we can do analysis called
total pair groups. So what you're going
to do, we will add partition by to the
definition of the window. So by adding, for example, here, partition by products,
what can happen? The data going to
be splitted into two categories or two groups, and the aggregation going to be done for each window separately. This is, of course, a great
analysis in order to go and compare different products like here, the caps and cloves. So this is helpful in order
to compare categories. You can do this analysis total pair groups if you
use the partition by. Now, if you go and
use the order by, you're going to land
in the third use case. As we learned, we will
be doing running total. As you can see here
in the output, we are building a commulitive
value for the sales, and this can help us in
order to do progress over time analysis in order to understand the performance
of our business. Now moving on to
the last use case, the final phase of the window function with the aggregation. Here you have the aggregate
function together with the order by with
customized fixed window. Of course, we can use it in
order to help us building progress over time in
specific fixed window. Of course, you can
use those use cases. You will get the same effect if you use the other functions, not only the sum, you can
use average count maximin, so all aggregate functions. Guys, as you can see, the
window function scale is very important in order
to do data analytics. By just like changing
the part of the window, you are generating a
whole new use case for data analytics.
228. 4 8 win aggr summary: All right. Now we can
have a quick overview of the different use cases in the aggregate functions and how the definition of the window going to change the
whole use case. So now, the first use case is
finding the overall total. And here, if you don't define
anything in the window, if you leave it empty, what can happen you are doing
here overall analyses. So you're going to
go and aggregate the whole data sets and then provide this aggregation
for each row. This is what happens
if you leave it empty, you don't
define anything. You are aggregating
the whole datasets. Now, moving to the next step, we can do analysis called
total pair groups. So what we're going
to do, we will add partition by to the
definition of the window. So by adding, for example, here, partition by products,
what can happen, the data going to
be splitted into two categories or two groups, and the aggregation going to be done for each window separately. This is, of course, great
analysis in order to go and compare different products like here, the caps and gloves. So this is helpful in order
to compare categories. So you can do this analysis total pair groups if you
use the partition by. Now, if you go and
use the order by, you're going to land
in the third use case. As we learned, we will
be doing running total. So as you can see
here in the output, we are building a commulative
value for the sales. And this can help us in
order to do progress over time analyses in order to understand the performance
of our business. Now moving on to
the last use case, the final phase of the window function with the aggregation. Here you have the aggregate
function together with the order by with
customized fixed window. Of course, we can use it in
order to help us building progress over time in
specific fixed window. Of course, you can
use those use cases. You will get the same effect if you use the other functions, not only the sum, you can
use average count maximin, so all aggregate functions. So, guys, as you can see,
the window functioning scale is very important in order
to do data analytics. By just like changing
the part of the window, you are generating a
whole new use case for data analytics. All right, friends.
So now let's do a quick recap about the
window aggregate functions. So what they do, they're going to go and aggregate a set of values and return a single
aggregated value for each row. So it's very similar to the roi. But here we don't lose details. Now, to the next point, what are the rules for the syntax? About the expressions, they all expect a number in
the expression. So you have to pass a number
like sales or any integer. But only for the count, you
can go and use any data type. The things for the aggregate
functions are very simple. Everything is optional
inside the definition of the over clause or the
definition of the window. So you can go and use
partition by or by frames or not or just leave everything empty. Everything is optional. Now as we learned, we have a lot of use cases for the
aggregate functions, and they are really
amazing for analytics. So the first one, the
simplest one you can do overall analysis if you just leave the window
function empty, so you will get one big
number about your business. And the next use case, we can do total bare groups analyses
As learned we can use partition by in order to
compare categories with each other like comparing the products or
customers and so on. Moving on to the
next one, we can do part to whole analysis. We can go and compare
the performance of each data point
with the overall. So you can, for example,
compare the seals to the total sales in the window
or to the all data sets. And we have many
comparison analyses. We can go and compare
the current value with the average or we can compare them to the extreme to the highest seals to the
lowest seals, and so on. Another use case, we can go and identify data quality
issues in our data. We can go, for example, identify double kits using
the count function. Moving on to the next use case, we have the outlier detection. We can go and find out which data points are above the average and below
the average and so on. Then the next one we
have the running total. As we learned, it is a great
tool in order to track the progress or to monitor the performance of our
business over the time. Or if you want to
be more specific, you can go and use the rolling
total in order to have a specific window and only track this window like three months or
something like that. And the last use
case, we can go and calculate the moving
average of our data. So it's really amazing how order by and aggregate functions can open for you a door for amazing or advanced analyzers. So, guys, as you can see, we
have a lot of use cases for the window aggregate functions in the world of data analytics. Alright, so with that you have learned how to
aggregate your data using four different Scale window functions and
their use cases. Moving on to the next one, we're going to learn how to rank your data using six different
SQL window functions. So as usual, we're going to
do D dive into the syntax, how scale works, and the
different use cases, they are amazing
for data analytics.
229. 5 1 win rank what is: Hey, friends. So now we're
going to learn how to rank your data using six different
window ranking functions. We have the row number,
rank, dense rank, ile, um dist, and as
well, the percent trach. As usual, first, we have to understand the
concept behind them. And after that, we're
going to learn the syntax, and we're going to have the
most important use cases for the ranking functions that I collected
from my projects. So now let's start with
the first question. Why do we call them
ranking functions? So let's go. All right. So now, let's say that, we
have the following data. We have products
and their sales. If you want now to go and
rank your products, first, you have to sort the data
based on something, like, for example, ranking the
products based on their sales. So that means SQL first
going to go and start sorting your data from the
highest to the lowest. So sorting the data is always the first thing CL has to do. Before ranking anything. Now, in order to rank our
data, we have two methods. The first method, we call it
the integer based ranking. So that means Equal is going to go and assign for each row an integer whole number based
on the position of the row. So now, by looking to the
example, the first row, we have the product
with the sales 70, it's going to be
rank number one. Then the next row, the
product B with the 30 sales, we will get the rank number two. Then the next one
going to be three, four, and the last
one going to be five. So that means Equal here is assigning an integer
for each row, based on their position
in the sorted list. This method, we call it
integer based ranking. Now, let's go to
the second methods. We have the percentage
based ranking. So in this methods, SQL
going to go first and calculate the
relative position of the row compared to all others and then assign a
percentage for each row. So in the output,
qual going to start assigning percentages
instead of integer, and we're going to
have a scale 0-1. So now, if you go and
compare both of the methods, you can see that
on the left side on the integer base ranking, we have discrete
distinct values. It starts from one
then two, three, and end up in this
example by five, so it really depends on how many rows do we
have in the results. It could be five, it could be
500, 5 million, and so on. But in the right side, we have always the
same scale 1-0. 0-1, we have infinite
number of data points. This scale, we call
it a normalized scale or we call it continuous
scale, continuous values. Now the question is when
to use which method. For example, for the
percentage based ranking, it is great to answer
such questions. Find the top 20% products
based on their sales. This method is a great way
in order to understand the contributions of data
values to the overall total. We call this kind of analysis
a distribution analysis, where on the other hand in
the integer based ranking, we can answer questions like
find the top three products. With this question, we
are not interested about the contributions of each
product to the overall total. We are just interested in the position of the
value within a list. So this is very commonly
used analysis and reporting. We call it tub button analyses. So now let's group up
our ranking functions based on those two methods. For the first group in
the integer base ranking, we have four functions, raw number, rank, dense
rank, and entile. But on the other hand, we
have only two functions that generate percentage
based ranking. We have the mist and as
well the percentile. So now that was an
introduction and overview of those methods and how we group up those
ranking functions. Next, we're going to
go and learn about the syntax of the
ranking functions. Most of them follow
the same rules. So, for example, we start
always with the function name, so we have here the rank. But as you can see, we
don't use any expressions, so they don't allow you to
use any argument inside it, it must be empty. So this is the first rule
using rank functions. Then about the definition
of the window. As usual, the partition by, it is an optional thing. You can use it or leave it. And now to the second part,
we have the order by. It is as well required. So you must order the data or sort your data in
order to do ranking, so you cannot leave it empty. So that means for the
definition of the window, at least we should
have an order by, for example, here, sales. So we cannot leave it empty. All right. So the
two requirements, you cannot use any expressions
for those functions, and as well, you have to sort
your data using order by. So now let's have an
overview of all functions. So as you can see,
all those functions are ranking functions, and almost all of them don't allow to use any
expressions inside them. Beside this function
here, we have the tile. It accepts a number inside it. So that means you
cannot use it empty. You should use a
number inside it. All others must be empty. So now for the potential by, all of them are optional, and for the derby, all
of them are required. So you must use derby
and the frame clause, they are not allowed to use
in the ranking functions, so you cannot change the definition of the frame
inside the window function. So now what we're
going to do as usual, we're going to go and deep dive into all of those
functions in order to understand when to use them and what
are the use cases, and as well,
practice in a scale. So we're going to
start with the first one, the row number.
230. 5 2 win rank row number: All right, so what
is a raw number? In ICL, the raw number function is going to go and
assign for each row, a unique number as a rank, and it doesn't care at
all about the ties. That means if you have two
rows sharing the same value, they will not share
the same rank. Okay, so now we have
very simple example. We have a list of all sales, and we have the following query. So it's going to start
with the ranking function, a raw number. It doesn't accept any
argument inside it, and the definition of the
window going to be like this, order by sales disc. That means we're
going to go and sort the data descending from
the highest to the lowest. C going to go and
do the following, the highest going to be the 100, the lowest going to be the 20, and here we have twice, the 80. Now once SQL done
sorting the data, what's going to happen, it's going to start assigning a rank. So the row number
are going to go and assign a unique
number for each row. So that means it got to
start with the first one, the 100 going to be
the rank number one. The next one going to
be rank number two. The 80 going to be rank
number three and the 54, and then the last
one gonna be five. And now, if you
check the output, you can see that, all
those numbers are unique. We don't have any repetitions. So one, two, three, four, five, there's no repetitions. They are unique distinct value. And as well, there are
no skipping of ranking. So that means we have
here one, two, three. There's no jumping to
six seven or something. There are clear sequence
of distinct value, and there are no gaps. But still there is something
special in our data. We can see that in the sales. We have the same value twice. So we have two rows
with the same sales. As you can see in
the row number, they will get distinct value. So they will not share
the same ranking. So that means row number
does not handle the ties. If you have multiple rows
sharing the same values, they will not share
the same rank. They can have a distinct
rank different ranks. So this is how the row
number works in Sq. It generates unique
ranks for each row. It does not handle the
ties, and as well, it doesn't leave any gaps, so there is no
skipping or ranking. So now let's go to Sq in order to have few
examples and use cases. All right, so now we
have the following task. It's very simple,
rank the orders based on their sales from
the highest to the lowest. So now, this is very easy. We're going to go and
select first the data. So order ID, product ID. Let's take the sales as
well and select the table. So it's going to
be sales orders. Let's go and execute it. So with that we've
got all our orders. What you're going
to do now is to assign for each row rank. That means we need a column here that contains the
rank for each row. In order to do that,
we're going to go and use the window function row number. It doesn't accept any
argument inside it, so should be empty, and then we have to
define the window. As we learned in the
ranking functions, we cannot leave it empty. We have to sort the
data using order by. Order By is a mast. We don't have to use
any partition by, so we can rank all the data that we have
inside the table. So how to sort the data, it says it should be based on their sales from
highest to lowest. That means we order by sales. Since from highest to lowest, we have to use the descending. And now we're going to
go and give it a name. Sales rank and let's say row. Since we are using
the row number. So that set is very simple. Let's go and execute it. So now let's have a
look to the results. Before Equal did sort the
data by the order ID, since we didn't define anything. But since now we are
order by sale descending, qual went and sorted the
data by the sales from the highest to the lowest and
start assigning a rank. Or let's say an integer
unique integer for each row. Now the highest order going
to be the order er eight. We have the sales of 90.
This is the highest one. So you can see,
we have one, two, three, four, five, until ten. Now by checking the
result, you can see that. The ranking here is unique. There is no
duplicates over here, and as well, there is
no skipping or gaps. So we have everything 1-10. Even though that we
have in our data, a couple of sales, that's
sharing the same value. For example, we have those
two orders. You can see. Both of them has the
60 at the sales, but they don't share the
same ranking, right? So we have here as well,
the nine and three, they share the same value 20, but they don't share the same So with that, we have solved
the task. It's very simple. We have now a rank based on the sales from
highest to the lowest.
231. 5 3 win rank rank func: All right, so what
is a rank function? In aquel the rank function going to go and
assign for each row, a number rank, and this time it's going to
go and handle the ties. So that means if in your data, you have two rows
having the same values, they go to share
the same ranking. One thing about the
ranking function, that it's going to go and
leave gaps in the ranking. So there's possibility
of skipping ranks. In order to understand how the rank function
works in Squeal, we're going to have a
very simple example. Alright. So again,
with the same data, but with different function. So our window looks like this. It starts with the
function rank, doesn't accept any
argument inside it. Then we have the
window like this or by sales descending from the
highest to the lowest, and our data is already
sorted like that. So now, how is Kale going
to go and assign the ranks? The first row going to
be the highest rank. So the value 100
going to be one, then the second one
going to be two, but now for the third
one, as you can see, we have here two values
that are the same. We have a ti, and this
time qu going to go and as we lead them to
share the same rank. Both of them going
to be the rank two. It's not like the row number where we have over here three. This time we have two
because we have a ti. Having the same values means they going to
share the same rank. Now moving to the next
value going to be tricky one because if
you check over here, you can see that, the next
rank should be like the three. We have one, two, and then the next value that
generated in the rank. Should be three. But Esq
going to say, You know what? This value position is
going to be number four, so you can see, one,
two, three, four. So actually, the position
number here is four, and squeal going to go and
give it the rank of four. So with that, Equal is going to be leaving a
gap in the ranking. You can see we are skipping
the rank number three. And this always
happen once you have a tie where you are
sharing the same ranking. So for the x one is
going to be easy. It's going to be the
row number five. So now by looking to the
output of the rank function, you can see that we don't
have unique ranking here. We have shared ranking
in case of the ties. So it handles the ties. But here we have
gaps in the ranks. So we are skipping ranks. When I think about
the rank function, I think about the Olympics. If two athletes tie
for the gold medal, the first place, there will be no silver medal
for the second place. The next medal going to be given to the bronze to
the third place. All right, so now let's
go in this qual in order to practice
the rank function. Alright, now we're going to
go and solve the same task, but using the rank function. So what we're going
to do, we're going to stay with the same
example over here, and we're going to
rank the order base on the cells from
highest to lowest, but this time, using
the rank function. So we use the rank and everything inside it
is going to be empty. And then our window going to be exactly the same as before. Over order by sales and disc. So let's give it the name. Sales rank. Yeah,
let's give it a rank. So that's it, as you can see, the syntax is very simple and very similar to the row number. We just changed the function. So now let's go and execute this in order to
check the results. So now let's go and check the results by looking
to the new rank, if you go and compare
it with the old rank. We can see that we are
sharing some ranking, right? We have here the two twice. So the rank number
two, we have it twice because we have
over here the same value. So 60, 60, we have it
here, two and two. But if you compare
to the raw number, you can see that it is not
sharing the same ranking. So this is one difference, and as well here, the same thing. They have the same value. The sales is 20, so
we have it twice, the rank number seven. And here we have it
as different values. And the next value, as you can see, we are
skipping the rank. So there is GAP. There is no rank of eight. So
you can see that. This is the row number nine, and that's why it get the nine. The same thing I
believe over here. So now if you check
those two ranks, the next one should be three, but since it is in
the row number four, it's going to get the rank four. So by checking the results, we can see that sharing the same ranks and as
well we have gaps. So this is how the rank works.
232. 5 4 win rank dense rank func: All right, so what is a dn rank? It is very similar to
the ranking function. It's going to go and assign
for each role a number rank, and it as well handles the ties. So same values, they go to
share the same ranking. But this time it doesn't leave any gaps like
the rank function. So the dns rank, it will not leave any gaps. It will not skip any ranking. So in order to understand this, we're going to have a very
simple example. So let's go. Alright, so again,
the same data, but with different function. We have this time the
rank function dense rank, and the window going to
be the same order by sales descending from the
highest to the lowest. So now the data is as
well sorted already. Let's see how SQL going to
go and assign the ranks. As usual, the first row going
to be the rank number one, the second as well. But again, here, we
have the same values. So we have same values,
and it's like the rank. It's going to go and
share the same rank. So both of them going to have the rank number two.
And now you might say, Well, this is very similar
to the rank function. So why do we have dense rank? I'm going to say wait for it. We're going to have the
difference in the next value. So qual going to come over here. This value is exactly
after the tie. And rank, qual went and
took the position number. So the row number
it was four, right? So one, two, three, four. But this time with
the dense rank, Q will not leave
gaps in ranking, so there will be no skipping. The next rank and the
sequence can be three. So that's why we're
going to have the rank three for this value. So as you can see,
there is no gap. We have one. We
have two and three. So we are not skipping, we are not leaving any gaps, and the last one
going to be four. So this is exactly
the difference between the dense
rank and the rank. So now by checking the output of the dense rank,
you can see that. We don't have unique ranks. We have here shared ranks. As you can see, we
have here repetition. So it handles the
ties and as well. It doesn't leave any gaps. It doesn't skip anything
in the ranking. Okay, so that's it.
Now, let's go back to Scale to practice
the dense rank. Now we have the same
task rank the orders based on their sales
from highest to lowest. We're going to do
the same stuff, but this time using the function Dnrank Tense sranks
going to be empty, and then we're going
to define it like all others over order by Sales disc then we're
going to give you the name of sales
strink dense that's it. As you can see all
of those functions, having the exact
same tax rights. So let's go and execute it. Okay. So now let's go
and check the results. We got our new rank
using the dns. And by just checking
the results, you can see that it
handles the tie. We have two twice, right? So let's check the
example over here. We have the seals 60 twice. That's why they are sharing the same ranking in the dns and as well in the normal rank. But now, what is interesting
is the value after the tie. So as you can see over here, with the dns rank, we have
three. So we didn't skip A ranking, we don't
have any gap, one, two, and then three. But with the rank, it's just focused on the position number, so it is the row number four. That's why it's four,
with that we have a gap. So as you can see, now we don't have any gaps
in the dns rank. So we have three, four, five, and now we have over here,
the same two values. So we have sales of 2020, and they share the six twice. So as you can see,
there's difference now between the
dns and the rank. So here we have seven seven, but here we are at
the rank six six. So that's why we have
differences between them, because we skipped before
in the rank number three. Now the other stuff you can
see, we have seven and eight. So now, if you compare
those three ranking, you can see that they all start
with the rank number one, but they didn't all end
with the same ranking. So the row number and the rank, they really focus on
the position number or the row number of the orders. So you can see over here, it is the row number ten.
That's why we have here ten. Ten. So the scale is 1-10, and that is exactly the same
for the roll number 1-10. But with the dns over
here, we have it 1-8, and that's because we
shared the same ranking, and with that we wasted, let's say, a few ranks. So the scale is different
from the two others, and that's because
we have ties twice. This is one tie, and as well
we have over here, one tie. That's why we are missing
over here two ranks. So this is how the
dn strengths works, and you can go and compare
now all three togethers in order to understand how
those strengths are working.
233. 5 5 win rank compare ranking: All right. Now let's quickly compare the three
functions side by side. Let's start with the first point about the uniqueness
of the rank. And if you compare those three, you can see that
only the row number generates unique distinct rank. This is going to be unique rank, and the two others, we have double kits or let's
say shared ranks. Now the second point, whether the function handles the ties and the only one
that doesn't handle the ties is the row number. So This one doesn't
handle the ties and the two others handles the ties since they
offer the shared rank. Now we have the last point about leaving gaps or
skipping ranking. Now if you check the raw
number and the dense rank, you can see there
will be no skipping. There is no gaps for the raw number and as
well for the dense rank. Only for the rank
function, the middle one, we are skipping ranks and we are leaving gaps. That's it, guys. This is the differences
between those three functions. I tend usually to work with the raw number more often
than the two others.
234. 5 6 win rank top bottom analysis: All right, guys. So now, I had to look to those
three functions, and I checked my
projects, real projects, and I found out that
there are many use cases for the function raw number compared to the other functions, dense rank and rank. So now what we're going to
do? I'm going to show you a few use cases for the rank
number that I usually use in my real projects in
order for you to understand how important is
the raw number function. So let's go to a
scale. Alright. So now let's start with
the first use case, and we have the task of find the top highest sales
for each product. So this is very classic. In reporting or data analysis. We call this top analyses. So here, the managers
or decision makers, they would like to have
the best performers or the best success in our data. So, for example, the
top highest five customers or the
top five products or categories and so on. So this is very important
analysis in order to focus on the best products or to the most important
customers and so on. And this is, as I said,
very classic and very important in order to make
decisions in the business. So now let's see how
we can solve this. So we're going to start
with the usual stuff. Let's first select the data. So select order ID. Let's take as well,
the product ID. And the sales from sales orders. So let's go and execute this. Now as we know that
for each product, we have multiple orders, and we have multiple sales. But we are interested only in the high sales
for each product. So we have to go
and create a rank. In order to do that, we
can use the raw function. Raw number, and we have
to define the window now. So do we need partition
by? Check the query. So it says for each product. That means we have to divide
the data by the product ID. So let's go and use the
partition by products ID. And now we must
use the order by. So order by and now how to
solve the data by a sales, and it is from the
highest to the lowest. Let's go sales and we have here. Descending, from
highest to lowest. Let's go and give it a name, you're going to be
ranked by products. Let's go and execute this. Now by looking to the results, you can see that CL did divide the data by the product ID. So we have here
around four windows. The first one over
here, you can see that the rank starts from
one it's with four, the highest rank can be the order number eight
with the sales of 90. And then it goes to the four. Now, as you can see
that the second window, we have a new ranking.
So it resets. The first going to be
the order number ten, and the last one going
to be order number two. So as you can see,
each window has its own ranking as well, the last one, we have
it only as one row. Of course, in the task, we
have to return the highest, so we are not interested
in the others. We have to return this this row as well and this
one and this one. As you can see, We have to return everything that
has the rank one. We are not interested
in the rank two, three, four, and so on. So we would like to
have the highest. So now, in order to
filter the dots, what we're going to
do, we're going to go and use sub queries. So select star, from and then we're going to have
the following condition. So where, and we're going to say rank by products equals to one. So we are interested only
on the rank number one. Let's go and execute it. And with that since we have
four products in our data, we're going to have
only four rows, and we have the highest sales. As you can see, we have
only number one over here, and those sales are the
highest for each product. And with that, we have
solved the tasks by finding the top analyses. Okay. Moving on to
the x use case, we have the following task, and it says, find the lowest two customers
based on their total sales. So now we have the exact
opposite use case. We call it button analysis. So now, in this example,
in the business, the decision makers want
to optimize the costs, want to cut costs. And with that, they have to analyze the lowest performers in the products or the
lowest performers in the employees in
order to cut costs. So now with this analysis, the decision makers are not focusing on the best
successful stuff. We are focusing on
the lowest stuff. The lowest performers. So now let's solve these tasks. So if you check the question, we have multiple stuff, right? We have the total sales, and as well, we have to find
the lowest two customers. So we have ranking and
as well aggregations. Remember, we can do stuff
together with the groupi. So now let's do it step by step. First, let's select
the data, right. So what do we need? Order ID? Customer ID. And we need the
sales from sales orders. So let's go and execute this. So now, if you check the
customers over here, we have around four customers, and they have multiple sales. Now, we would like to
have the total sales for each customers in order
to find the Luis two. So let's start first
with the aggregations. So what we're going to
do, we're going to go and aggregate the sales. So the sum of sales. Let's call it total sales. And now, in order to
do the group buy, we have to have
only the customer. So group, and we have
the customer ID. So it is very simple
group by statements. Let's go and execute this. So now by checking the results, we can see that scaled
aggregate the data. We have four rows, and
that's because we have four customers and we
have their total sales. So we have solved the
first part of the task. We have the total sales
for each customers. Now let's move to
the second part. It says lowest two customers. That means we have to use the ranking functions in order
to rank those customers. So we are not interested
in all customers. We are interested only
in the lowest two. So in order to do
that, now we're going to go and use
the window function. Row number, and then over. Now, do we have to partition
the data? Well, no. We don't have to
do that. We have now to sort the
data. So order by. So this time, we're
going to go and use the aggregations
in the order by, so the sum of sales, and we want to have it sorted from the lowest to the highest, so I'm just going to go
and use the default. So it is ascending. Now, let's call it
rank customers. So that's it. Again,
here, the rule is that. If you are using
a window function together with the
group by function, you have to use only columns that is used in the group by. So this should be working. Let's go and execute it. So now, as you can
see in the results, we got an extra
column for the rank. Now the lowest customer going to be the
customer number two, the second one going to be
four with the 90 total sales, and the highest customer with the sales is going
to be the last one, the 125 customer number three. Now we have almost everything, but the list should
contain only the last two. So in order to do that
to filter the data, we're going to go and use
sub query select star from And then we have to define the condition
where rank customers, it should be smaller
or equal to two. So with that, we will
get the first two. Let's go and execute this. And with that, we got
the lowest two customers based on their total sales, customer number ID
two, and the four. That's it, we have
save the task, and now we have done
patent in analyses.
235. 5 7 win rank unquie ids: Okay, let's keep moving
to the next use case and we have the following task. It says, assign unique IDs to the rows of the
table, orders archive. So now, guys, we might
be in a situation where you have a table without
any primary key, and you would like to
create an ID for each row. So in order to do
that, we can use the function row
number in order to generate unique identifier IDs for each row inside our
table if we don't have one. And generating such
ID for each row, it's very important
to do stuff like importing data, exporting data. Maybe joining tables
as well using this ID, or let's say optimizing the performance of
query using the ID. So now let's see how we can
generate that using R. Okay. So now let's first select the table order archives in order to understand
the content. So select star from
sales orders Archive. So let's go and execute. So now by checking the results, you can see that we
have ten orders, and we have repetitions in
the order ID over here, so it is not really
a primary key. As you can see over
here, we have twice, ID four, and here we have
three times the ID six. Now what we're going to do, we're going to go and generate unique identifier for each row. In order to do that,
what you're going to do? We're going to go over
here and say row number, and then we're going to
define the window function. We don't partition
the data at all, but we have to sort
the data by order ID. Oder order ID. Or you can use
something else as well using the order date or
something, doesn't matter. Let's add to its
order data as well. Let's call it Unique ID. Let's go and execute this. Now, by checking the data, you can see that
we have a new ID over here that comes
from the raw number, and we have unique identifier. As you can see,
we have ten rows, and with that we have as well, ten different
distinct unique IDs. With this, as you can see,
we have solved the task, and we have now unique identi and ID for the table
ordered archive. Now having this ID, we can do many stuff like joining tables or doing something special and important called paginating. Imagine we have,
like a huge table, and we would like to
retrieve the data. So now in order to not have
all the data in one go, we can go and divide the data by the primary ID or by
unique identifier. For example, we can make a
page from one until 100,000. And then the second page
starts from 100 k to 200 s. So now by dividing the data, we can maybe improve
exporting or importing data, or we can have faster
retrieval for the users. We don't want to
have the whole data in one go in one page. So it has a lot of benefits
using paginating and we can do that only if we
have a nice ID like this.
236. 5 8 win rank identify duplicates: All right. I'm going to show
you the last use case for the function raw number that I usually use in
my real projects. Sometimes if you're
doing data analysis, you're going to
find out that there are data quality issues, especially with the double kits. So what I usually use, I use the raw number in order to identify
the double kits. Not only that, I can use it in order to delete the double kits. So we can use it in order
to do data cleansing. And this is essential task
for each data engineer, not only data analysts, in order to prepare and clean up the data before
doing data analyses. So let's have the
following identify duplicate rows in the table, orders archive and return a clean result without
any duplicates. So not only we have to
identify the Dublicates, we have to return no duplicates in our results. Let's
say we can do this. Let's first select the data, select star from
sales orders archive. Let's go and execute. Now by looking to the
data, you can see that we have Dublicates.
We have an issue. So the other idea
before is twice in our database. It
doesn't make sense, right? It should be only one. Which
one is the correct one? If you check the data over here, you can see that this order is shipped and then delivered. So it looks like the last
one is the correct one. So how we can do that. If you
just scroll to the right, you can see that we
have a creation time, and we usually use such
a timestamp in order to identify what was the
last valid like order. And then we can see
immediately that this order time is higher
than the previous one, which means this is
the more up to date, right? The more current. So what we're going to do,
we're going to go and rank our data for each order
ID and sort the data by the creation time
in order to find the last inserted or curated
raw for this order. So let's see how we can do that. What we're going to do, we're going to go
over here and say, let's have a raw number. And then over, and what
we're going to do, we're going to partition
by the primary key. Partition by order ID. And as we said, we
have to order the data by this time stab at the end, partition by order by
creation, time and descending. So we want the highest
then, the lowest. That's it. Let's call it
RN and execute the query. Now by checking the data, if everything is clean and
we don't have Dubliates, everything should be one because maximum for
each primary key, we should have one row. But you can see
very we have here two and we have here three, two, that means this is indicator that we have doubliates
inside of our data. So now by checking one by one. As you can see, the
order ID is only one. We have the rank one,
the second one as well. We have the rank one. But
here we have the issue. As you can see, we have now two ranks for the order ID four. Now, which one is the correct? In our logic, we say, it is the last row that is
inserted inside our data, and this is rank number one. If you scroll to the right side, you can see that
the creation time here is higher than
the second one. With that, we have
identified what we want. We want the last inserted
row for each ID. And now let's check
this over here. So here we have it three times. So it says the first one is
the highest creation date. So if you go to the right side, and now by comparing those
time stamps, you can see that. This records, the first one is the latest one that is
inserted inside our data. So as you can see, this one
is the one that we need. The other two, we don't need it because it is old informations. So now, everything
that doesn't has that rank number one is not
valid. It's something old. And It's actually
that data quality, so we want to remove it
or not to select it. So now, in order to
have a clean data, what we're going to do,
we're going to go and select the following
as subselect. So select star from that table. And now we are interested Only
with the rank number one. We don't need anything else. So let's go and execute. Now if you check the
results, you can check the order ID over here. It is unique. We don't have
any Duplicates, right. One, two, three, four,
five, six seven. There is no Dublicates at all, and we have now only the latest inserted data
inside the orders, and we don't have any duplicates
or data quality issue. So now, of course,
now we can go with these results in order
to do further analyses, and this is exactly what
data engineers usually do. Clean up the data
and prepare the data before doing any data analysis. Of course, if you
want to communicate those data quality issues
to the source of the data, let's say you are not the
owner of those informations, You can generate a list of
all bad data quality issues, and you can send it
to the source system and tell them to clean
it up from the sources. Now in order to
select the bad data, what we're going to do is, we can just change here
the condition and say, if it is higher than one, then you are like bad data. Let's go and execute this. Now with this we
have in the results, all records that shouldn't exist in the data
in the first place. So we can go and export it and communicate it to the
source and tell them. Check here you have something
wrong in your system, and those information should
not be inserted in the data. So, everyone, it is very strong, right? It's
very powerful. I use it a lot in my projects. There are many use cases for the row number function in SQL. We can do it in order to
find the top analyses, the bottom n analysis, the best performance,
worst performance. And as well, we can assign
unique IDs to do paginating or we can use it in
order to discover data quality issues
to clean up our data, so it is amazing function in SquL and you're going
to use it a lot. So that's it for the
three functions, row number rank, a dense rank. Now we're going to
talk about the entile
237. 5 9 untile: Okay, so what is tile? Tiling scale is very simple. It's going to go and
divide your rows, your data into
specific number of almost equal groups or
sometimes we call them packets. So now in order to
understand this and how it scale works
with this function, we can have a very
simple example. So let's go. Okay, we have
the following set up. We have four rows, four sales, and we would like
to divide it into two groups or into two packets. So in order to do that, we
can use the entile function. It has different syntax than
the other raking functions. So it starts with tal. Then we must define a number, so we cannot leave it empty
like the other raking. So here we have two
packets, then over. And here again, we
have to sort the data, so it is must order by sales descending from the
highest to the lowest. So now, as usual, que going
to go and sort the data. We have it already
sorted in this example. Then it can start assigning each of those rows
in two packets. But quel first has to
calculate the pocket side. So how many rows we can
insert inside each packet. So the calculation
is very simple. It says the packet
size equals to the number of rows divided
by the number of packets. So what is the number of rows
here? We have four rows. So we have four over here. Then the number of packets, we define it in the
syntax of the query. So here we define two
packets. We need two groups. So that means we
are dividing four by two and the size of the
packet is going to be two. Now with this scale is ready, I'm going to start assigning
each row to packets. It's going to start on the top. The first one going to be
in the packet number one. Then go to the next one. It's going to say,
Okay, we still have enough space in the packet. It's going to assign
as well to one. But with this, we reach the maximum number of
rows within each packets. The next row going to be
assigned to another packet. It's going to be two, and the last one going
to be as well two. As you can see,
it's very simple, we have just assigned our
sales based on the sorting, of course, into two packets. This two sales belongs
to the packet number one and the other two belongs
to the packet number two. Very easy. So that's
why it's very straightforward because we
are dividing even numbers, and we got perfectly
sized packets. But now, what can happen
if we have an odd number? So we have here five
instead of four. So the packet side going to be dividing five y two,
we're going to get 2.5. And now, of course,
quel will not go and divide like two half
for each packet, then we are splitting
this into two packets. Of course, this will
not be working. We should have now a packet with three and another
packet with two. So now the rule in Squal
makes it very clear. It says larger groups
comes first, then smaller. So that means, if we have here
an even number like this, The larger group going
to be the first group. So that's going to
look like this. It's going to reset everything. Let's see what's
going to happen. The first one going to be one, the second one is bill one. The third one going
to be as well one. So it has a larger packets
than the second one. Then the rest going to be two. As you can see, the
larger group comes first, then the smaller, and this is how scale going to work
if you have odd numbers. You don't have here
perfectly sized packets. You have approximately or
roughly equally sized packets. This is how the tal works. Now, let's go back to scale in order to practice this function. So now let's have some fun
working with this function. So we're just going to select
something like order ID, sales from sales orders. Let's go and execute it, and with that we
got our ten rows. Now, let's say that,
I would like to create only one
packet from the data. So entile and only
one packet over. Partition. Let's say
not partition by. Let's take order by
sales cending that sets. I'm going to call it one packet. Let's go and execute
it. As usual, it's going to go
and sort the data and then calculate the packet. It's going to be ten
rows divided by one. So the size of the
packet going to be ten. So that's why
you're going to see everywhere here as one because all those rows can fit into one packet.
This is very simple. We have only one packet. Let's go and now
have two packets. I'm just going to
copy and paste. Instead of one, we're
going to have two, let's call it two packets. Let's go and execute this. Now here, again, what is
the size of the packets? It is 10/2. We will get perfectly
grouped packets. The first packet is
going to be five raws, and the second one is going
to be the next five raws. So it is very perfect. Let's go to the next one. Let's have three packets. Three. So let's go and execute. So now what can happen is
going to go and divide ten by three in order to
get the size of the packet, and it's going to be 3.3, so it's decimal, and we will not get perfectly sized packets. So again, the larger group comes first, and
then the smaller. As you can see, we have to fit then in the first group four, in order to get the
others with three. So that's why the first packet is going to
be the biggest one. So four rows into
the first packets then the second three rows
going to be in the packet two, and as well, the last
one go be packet three. As you can see, the larger group is going to be the
first packets. So now let's keep
playing with the data. Let's go and take now four. We would like to
have four packets. Now things going to
get interesting. So now by checking the result
is going to be interesting. Equal going to
divide ten by four, and we'll get
something like 2.5. So again, we will not get
perfectly sized groups. So QL has to fit now ten
rows into four groups. So the first three rows going to be fits in
the packet number one, and as well, the second
three rows like this, going to be in the
packet number two. And then you can
see over here we have two packets with
the size of two. And with that, we can fit
ten into four groups. And again, you can
see the larger groups comes first like this one. And then the second and
the smaller's comes later. Okay, so this is how the
inter woks ins qual. And now you might
say, You know what? Why do I need buckets
in the first place? So what is the use case?
238. 5 10 ntile use case data segementation: There is two use cases for the tal function in my projects. In one hand, if
I'm data analyst, I'm going to use
the tal function in order to segment my data. In the other hand, if
I'm data engineer, I'm going to use the tal
function in order to do ETL processing and as well
to do load balancing. So now let's start
with the first use case as a data analyst. Where you want to do segmentations
with the tal function, Segmentations is very nice way in order to
understand your data. So you can go and segment your data into different
packets or groups. Like, for example, doing
segmentations for the customers. So you can go and group up your customers depend
on their behavior, like the total sales, or the total number of orders. So with that, you can
make, like, for example, IV section, and then the
medium and then the law. So now in order to understand
the segmentation use case, let's have the following task. Okay. The task says segment all orders into
three categories. High, medium and low sales. In order to solve this,
let's do the basic stuff, right? Select order ID. Let's take the sales from
our table sales orders, and let's go and execute it. As usual, we got our ten sales. So now, if you check the task, it says, we need
three categories. So that means we
need three packets, and it says high, medium and low sales. So that means we are
dividing by the sales. Let's go and do it step by step. So we're going to use til since we need to
segment the data. Hree categories
means three packets, and then let's
define the window. Over, we don't have to divide
the data by partition by. We just need to sort
it first by the sales. So it's going to be by sales, and let's take discrete, since we want to sort it from
the highest to the lowest. So that sets, let's say
you are our packets. Let's go and execute this. Now if you check the data,
you can see that they are segmented into three packets. So the first packet
going to contain all orders with the high sales. Then the second one going to be all sales with the medium. And then the last
one going to be all sales with the low sales. So adecuacy we have already categorized our data
into three groups. But now, adequacy,
we have numbers, and maybe the user
is expecting to have those text,
high medium low. So that means, what
we're going to do now, we're going to go and translate those numbers into
text into words. And of course, we cannot do that inside the window function. We're going to use
data transformation using the case win statements. Don't worry about
it, we can have complete dedicated section
explaining the case win. So for now just follow me in
order to see how this works, we're going to go
and use subquery. So it's going to be selects. And let's take the
star for everything, and then let's have
the following logic. Ken, packets equal to one, then it is high.
The sales is high. We are just mapping the
numbers in two text. Otherwise, Ken, the
bucket is equal to two, then we are targeting
the medium. Medium. And then the last
group packets equal to three, then those sales are low. So let's call it ended, and let's call it
sales segmentations. So that sits. Let
me just make it a little bit smaller in
order for you to see it. From then we have our
sub query like this. So as you can see, we just
mapped the numbers into text. We are just doing translations. So let's go and execute it. And now by checking the results, we got our three
categories for the users. So the first category is
going to be the high sales. The second one going to
be the medium sales, and the third one going
to be the low sales. So, guys, you see, tal is very powerful in order to
segment our data. So now you can go and
segment stuff like the customers buy
their total sales or the products by prices, employees, by their
salaries, and so on.
239. 5 11 ntile use case data load: All right, so this is
the first use case for the intel function as a data analyst where
you go and segment your data in order to
understand the behavior. Now, in the other hand, if you are a data engineer, you can use intel function in order to do load
balancing in your ETL. So now I'm just going to explain it in very simple sketch. All right, we have the
following scenario where we have two databases, and we would like to move one big table from the
database A to database. So in this case,
I'm doing something called full load that means I'm loading all the rows from
one database to another. So if you do it in one go,
what could happen at that? It could take long
time, so it could take hours or even
sometimes days. And maybe at the
end, you will get maybe some network
errors because you have stressed the networks between those two databases and
everything go to break, and you're going
to load the data, and you have to start again. So now instead of loading
this table in one go, what we can do, we
can go and split it into fractions or
let's say packets. We can split this
table, for example, into four small tables
using the function tile. Now after we split this pig
table into small tables, we can go and start moving those small tables
one after another, and with that, we are not stressing the networks,
and it's going to succeed. Now after loading everything at the end in the target database, we're going to have those
small tables, and of course, we can go and use the union in order to merge them in
order to have again, the pig table that we have
it in the original database. This is a very
common use case for the tile in order to split the load and to balance the processing of
extracting data. All right. So now we have
the following Q task. It says, In order
to export the data, divide the orders
into two groups. So let's go and do that. First, we can select
everything from the table, in order to see the data. Sales orders. Let's
go and execute it. So now we got our ten orders, and what we have
to do is that, to go and split it into two groups. In order to do that, we
can use the tile function. Two groups means two packets. Let's define the window. Here we don't have to partition the data using partition by, but we have to
specify the order by. Now which column we're going to use in order
to sort the data. Of course, here,
there is no rule, you can go and split
the data by sales or by the order status by
date by anything you want. But we usually go and
use the primary key. It's just systematic
better and more clean, especially if you
have a sequence of numbers in the order ID, so you can export the
first range of the orders, then you can go to the
next group and so on. So let's go with the order ID, and let's give it
a name packets. So that it. Let's
go and hit cute. Now, as you can see,
it's very simple. We got our two groups, so this is the first
patch of the data, and this is the
second batch of data. So now we can go and select the first patch and export it, imported in the next system. And then after that, we
go with the second batch. And of course, if
you still suffer from the size of those packets, you can go and split it
to more smaller size, so you can go over
here and make it four. So with that, we're going
to get smaller packets, and it might be easier
to export the data. So this is really
great use case for the entire function.
Alright, everyone. So with this, you have learned
that two use cases for the tal function that I
usually follow in my projects. So as a data analyst, you can use it in order
to do segmentations. And as a data engineer, you can use it in order to do
load balancing of the ETL.
240. 5 12 win rank cume dist: Okay, everyone. So with that,
we have covered everything about the integer based
ranking functions. Now we're going to talk
about the second methods. We have the percentage
based ranking functions. And here we have two functions, the um dist, and as
well, the percent tile. So now let's have a quick recap. With the percentage based
ranking scale going to go and calculate a
relative position, as a percentage and
assign it for each row. The output going to be a
continuous normalized scale 0-1, and this is really
amazing in order to do distribution analysis. Those functions can consider
in their calculation, the overall total the
whole size of the dataset, which can help us in
order to find out the contribution of each
value to the overall total. Now in SQL in order to
generate the percentage, we have two different formulas. In one hand, we
have the function, QumDist and in the other
hand, we have the percent. That means we have two
different functions with different formulas in order to generate and calculate
the percentage. Now let's start with the
first function, Qum dist. All right, everyone. So now, let's start with
the first function. We have the um dist, and it stands for
cumulative distribution. It's going to go and
focus or calculate the distribution of your
data points within window. So what this means, in
order to understand it, we're going to go and have
very simple example to understand how QL works with this function. So
let's go. All right. Again, we have our very
simple example of the sales, and we have the following query. So um dist, then we don't give any argument inside
it, so it can be empty. The window going to be
like usual order by sales descending from the
highest to the lowest, and the order by is must. The first step is squeal going
to go and sort the data, we have it already sorted from
the highest to the lowest. Now the next step is
that squeal can go and start calculating the
percentage for each row. And we have a very
simple formula. It says, the um dist equals to the position number of the value divided by
the number of rows. Now the next step is squall
going to go and start calculate the percentage
for each row. And we have this
very simple formula. It says the um dist equals to the position
number of the value. Divided by the number of rows. It's very simple. Let's
do it step by step. So scale going to start with
the first value in our list. So it's going to be
calculated like this. So what is the position
number of the first value? It can to be one, right? So this is the first
value in our list. And what is the total
number of rows? We have five rows, right? So one, two, three, four, five. So we're going to
divide one by five, and the result is
going to be 0.2. So this is going to be the
first value for the first row. Okay, so now scale going
to go to the next row, and this time, we're going
to get a special case. As you can see, we
have the 80 twice. So we have here a ti. So now, first, we need the
position number. As you can see, we are at the
position number two, right? But since we have the
80 multiple times, EQ going to go and take the last position that
we see the value 80. And the last position going to be the record number three. So that's why ESCO going to say, for this record,
it's going to be the position number
three and two. And then it's going to
go and divided by five, and we will get
the value of 0.6. So this is the most confusing
thing with this function. So if a SCL finds a ti, it will completely ignore the current position number,
so we don't have two. It's going to go and take the last position number
for the same value. And the last in our list is going to be the
record number three. So that's why we have
three over here. Okay, now let's keep moving. Let's go to the third row, and as you can see, we
are again in the ti. But this time, this is
the last time we see 80. So next, we don't have 80. So what's going to happen, we're going to have exact
same results. So it's going to be 3/5. So, as you can see,
if we have a ti, they go to share the
same percentage. So that means with the um
dist if you have same values, they go to share the same rank. So let's keep moving
to the fourth one. So now, what is the
position number of the 50? We are at the record four. So position number 4/5, we will get zero. Eight. Okay. So now let's
move to the last one, and it is the easiest one. Which position do
we have over here, it is the position number five, it's the last one, and the
number of rows is five. That's why we will get
one. So, guys, that's it. This is how the cumulative
distribution works. Once you understand the formula, it's going to be very easy in order to understand the output. So as you can see, calculating
the percentage always depends on the total
size of our datasets. You can see here
the number of rows. With that, we're going
to get an output that help us in
order to understand the distribution of our data
points within the datasets.
241. 5 13 win rank percent rank: All right, everyone. So now we're going
to go and focus on the second function that
generate percentage as a rank. We have the percent rank. So the percent rank is going to go and focus on generating the relative position of
each row within a window. So in order to understand
what this means, we can have a very simple
example in order to understand how scale works with this function. So let's go. Okay, again, we
have those sales, very simple example, and the
syntax can be like this. Percent rank, and inside it, we don't use any arguments. And the window going to
be like this order by, it is a must sales descending from the
highest to the lowest. The first step that Sque
going to do with that is going to go and sort the data from the highest to the lowest, and we have it
already like this. Next, is qu going
to go and start, calculate the
percentage which is very similar to the
cumulative distribution. But this time it's
going to be like this. Position number, then we
subtract it from one, and as well, divided by the number of rows,
subtract it from one. So it's like exact formula, but we are only subtracting
here one for both numbers. Okay, so now, let's go through all rows step by step
and see the output. So it's going to start
with the first row, right? So what is the position
number of the first row? It's going to be one. Then we
have to subtract it by one. That's why we will get zero. Now, what is the
total number of rows? We have here five rows, and it is subtracted by one. That's why we're
going to get four. Now, zero divided by any value, the output can be a zero. So that's why for the first
value, we will get a zero. Alright, now let's move to
the second draw over here, and here we have our special
case where we have a tie. So we have two cells sharing
the same value, eight. So now, for the percent
trnkqel can have different behavior
than the um dist. Remember in the um dist, qual did search for the last position of
the shared value. So it was the position
number three, since this is the
last time we see 80. But now with the percent trunk, qual can stick with the first occurrence of the shared value. So now by checking
those two eights, what is the first occurrence? It is the record number two. So that's why we have
position number two, subtracted by one,
we will get one, and here the same going
to be a number of totals. We have five, subtracted
by one, we have four. So now if you
divide one by four, we will get the result of 025. So this is the percentage
of this value. So now let's go to
the second row. Here we have again, the ti. So scale go to stick with the position number two,
the first occurrence. So it's going to be the same
two, subtracted by one, we will get one, and as well, the total number of rows, five, subtract by one,
we will have four. That's why we will get
the same exact results. So here as you can see,
with the percent rank, it's um disc, the shared
value going to share as well, the same percent drank. Now, let's move to
the fourth one, so we have the value 50. So what is the position of this? It's going to be the
record number four. Subtracted by one,
we will get three, and if you divide three by four, you will get 0.75. And now moving to the last value over here, it's
going to be easy. So what is the position
number of the 30? It is five five
subtracted by one. It's going to be
four. And as well, we're going to have
four as well here for the total numbers,
subtracted by one. So if you divide four by four, you will get So that's it, guys. This is how the
percent rank works. It always has the scale 0-1.
So it's always like this. It doesn't matter which
values do we have inside, and it's going to have
a continuous scale. And again, here,
if you have a ti, they go to go and share
the same percentage drank. Okay, guys. So now if you go and compare
those two functions, you can see that they are
really similar to each other. The output of both functions, we are generating percentage, based ranking and
both of them as well, handling the ties perfectly, so they share the
same percentage rank. If you check the syntax,
they are very similar. And now by checking the
formulas of both of them, we are always considering the overall size
of the datasets. So here, the size is considered
in the calculation to help us finding the
relative position of each value to the overall. And this is very important
in the analysis in order to measure the contribution of
each value to the overall. Now about the use cases, if you want to focus on the distribution of
your data points, go with the cumulative
distribution. But if you want to focus on the relative position
of each rose, then go with the percent trink. Alright, now, there is
one more difference between the um disc and the percent trink and that's
if you check the formulas, you can see that the um
dist is more inclusive. We always consider the
position number of row. But with the person trink we don't consider
the current row, we like skip it or
make it exclusive. We say for the person
trank it is more exclusive and the
commuative distribution, it is more inclusive. Now if you ask me
the hard question, which one do you use? I'm going to say if you
want to be more inclusive, go with the commitive
distribution. If you want to be more
exclusive with the current row, go with the person trank they are very similar
to each other's. If you want to calculate the
distribution of your data, go with the commulative
distribution. If you want to find the
relative position of each row, then go with the
person t All right. So now we have the
following task that says find the products that fall within the highest 40% of the prices. Let's
go and solve this. Now we are targeting
the table products, and I will just select like two columns product price from sales products. So that's it. Let's go and execute this. So as you can see, we got
five products and the prices, and the task says,
find the highest 40%. So we have to find and
generate a percentage rank. In order to do that, we
have the two functions, *** dist and a percent rank. I will go this time
with the *** dist. Let's go and do
that. So um dist. And then let's go and find
that window like this. It's going to be order by. We are targeting now
the prices, right? So order by the price from
the highest to the lowest. And let's give it
a name Dest rank. So let's go and execute this. So with that qual can go
and generate for us a percentage ranking using the formula that we
just learned before. So on the output, we are
getting all the products, but the task says we have to get only the products that
are in the highest 40%. So that means the first row, the second row, and that sets. So those rows are
in the highest 40%. The rest are below that. So in order to do that,
so filter the data, we can use the sub query. So select star. From and then we have
our sub query like this, and then our filter going to be dist rank smaller
or equal to 0.4. This is our three should
in order to get the data. So let's go and execute this. Now, as you can see, we got the top products, the top 40%. Now, of course, you can go
and format the percentage. We can do that like this. Let's take the dist rank. Multiply it with 100. So let's go and execute this. So as you can see,
we got 20 and 40%. We can go and add to it as well, the percentage character, right. So we can go and say cart and we're going to
add the character. After that, like this, let's call it this
rank percentage. So that's it. Let's
go and execute it. So that you have
solved that task, we have the products that
fall within the highest 40%. Now, of course,
you can go and try the percent rank. So
it's very simple. We just have to go and switch the cumulative distribution with the function percent. Bank. So let's go and execute it. Now as you can see, we'll
get the exact same results, so we're still getting
the gloves and caps as the highest products within
the 40% of the price. So, guys, that's it,
it's very simple, right.
242. 5 14 win rank summary: All right, friends. So now
let's have a quick recap for the window ranking functions. So, what they're going to do? They're going to go
and assign a rank for each role within a window. And we have two types
of franking, right? The first one is the
integer based ranging. It's going to go and assign a number an integer
for each role. And here we have four functions, R number, rank, dense
rank, and tile. And the second type of franking, we have the percentage
based ranking. So scale fair is going
to go and calculate a rank and then assign
it for each role. And here we have two types
of formula or functions. So we have the um disc, the cumulative distribution,
and the second one, we have the percent rank. Now, to the next
point, if you are talking about the
rules of the syntax, so the expression
should be empty. We should not pass any
argument to the functions. We must use order by in order to sort our data,
so it is required, and the frame clauses
are not allowed to use, so you cannot go and customize a frame within the
window function. And as we learned, there are many use cases for the
ranking functions. For example, we have
the top in analysis, the pattern analysis in
order to identify a wire, performers or the worst
performers in our business. Another use case
using the row number, we can identify and remove
duplicates in our data, so we can use it
in order to find data quality issues and as
well to improve the quality. Another use case, if our table don't have a clean primary key, we can go and
generate unique IDs using the row number in
order to do as well, One more use case, it was
the data segmentations. You can use the Intel in order
to segment your customers, your products,
employees, and so on. And another use case, we can do data distribution analyses. As we learned, we can use the
QTS in order to understand the data distributions of our data points compared
to the overall. And the last use case, it's
more for data engineering. We can use the intel
function in order to equalize the loading
process of our ETLs. So as you can see, there are many use cases
for the ranking functions. Alright, so with that, you
have learned how to rank your data using six different
scale window functions, and their use cases, they are
amazing for data analytics. Now moving on to the next one, we have the last group
of window functions. We have the value functions. They are my friends, the
most important group for data analytics
compared to the other two. So here we're going to
focus on four functions. We're gonna learn how
scale works with that, the syntax, and as
well, the use cases.
243. 6 1 win value what is: Hey, friends, so. Now
we're going to talk about the most important category of window functions
for data analytics. We have the value
functions or sometimes we call them window
analytical functions. So here we're going to cover
four different functions. We have the lead lag, first value and last value. And as usual, we're gonna
learn the concept behind them, how scute them
behind the scenes, and then we can learn the
syntaxes and we're gonna cover the most important use cases for the value functions that I
collected from my projects. So now let's start with
the first question. Why do we call them
value functions. So let's go. All
right, everyone. So now we have this
very simple example. We have the months
and the sales. Now, we can use the
value functions in order to access a value
from another row. So in order to understand it, let's say that is L now
processing the months, and we are currently
at the month of March. So now, for example, I
would like to access the value from the previous
month from February. So in order to do
that, we can use the lag function in order
to get the value of ten. So with that we have
in the same row, the current sales
of the month March, and as well, the sales from the previous
month, the February. Maybe in other cases,
I would like to get the sales of the next
month from April. In order to do that, we
can use the function lead, and we will get at the
same role, the value five. So now I can very
quickly compare the current month with the previous month and as
well with the next month. Now in the other cases,
you might be interested, in the first month of your list, so it's going to
be here January. So in order to get the
sales of the first month, you can use the
function first value. So we're going to get
at the same row 20. And now for the last option,
I think you already get it, we can go and get the value
of sales of the last month. So here we can get the July. So for that, we're going to
use the function last value, and we will get the value of 40. So this is exactly
the purpose of the value functions or
analytical functions. We can access value
from another rose. And it's very important
to decide as well. The value functions is like
the ragging functions. We have to use the order by
in order to sort the data. In order to understand what is the first row
and the last row. In this example, the data
is sorted by the month. So, guys, the access functions are really important
for analytics. You can use it in
order to access a value from other rows in
order to do comparison. Alright, now let's have
a quick overview of the syntax and the rules
for the value functions. So here we have four functions, lead lag, first value,
and last value. So you can see we can group
them into two groups. So we have the lead aag, they are very similar
to each other's. Especially with the syntax, we can use three things or
three arguments inside it, expression offset default
for both of them. For the first value, we can use only an expression. That means we have to pass and
value for those functions. You cannot leave it empty. Now about the
expression data type, you can use any field
with any data type. There is no restrictions
about only, for example, using numbers. An data type is allowed. Now, about the definition
of the window. The partition by, as usual, is optional like
any other group. The order by here is a must. You must define an order by. It's like the ranking. Here, you cannot leave it empty. Now we come to the last one, we have the frame clause. They are really different
stuff over here. So for the first two
functions lead a lack, you are not allowed
to define any frame. So you are not allowed to
define any subset of data. It's very similar
to the ranking. So you must use order by, but you cannot define
the frame of the window. But for the other two functions, the first value and the last
value, they are optional. You can go and use them, and for the last value, it is recommended to
define frame clause. Don't worry about
it. We're can have enough examples in
order to understand. You can see those functions
has different requirements, so there's no generic
rule for all of them. But one thing that
they all agree on that you must use order by. Now, as usual, what
we're going to do? We're going to go and deep
dive into those functions. We're going to address
first the two functions lead and lack because they are very similar
to each other's. We can understand the use cases, when to use them, and of course, we're going to practice
in the squale. Let's go.
244. 6 2 win value min max: Lead functions. The lead function can allow
you to access a value from the next row within a window where the lag function
is exactly the opposite. It's going to allow
you to access a value from a previous role
within a window. It sounds very easy, right? So let's understand how scale going to execute
those functions. Okay. So now let's have
a quick overview of the syntax for both of the
functions, lead and lag. We have here very simple
example for the lead function. So, as usual, we start with the function name, it's
going to be the lead. And now after that, we're going to go and pass the arguments. And as you can see, we
have here multiple stuff. So let's do it step by step. So the first thing
that we're going to go and specify an expression, and the data type could
be any data type. It could be a number
like here, the sales, it could be a character like
names or dates or anything. So this is required. We have to specify
an expression, we cannot leave it empty, and we can use any data type. Now, moving on to the next one, we have here in
number. So what is it? This is the offset, and this offset is optional, so you can go and skip it. So what offsets means, what
we are doing over here. We are specifying for
SQL the number of rows forward or backward
from the current row. So here in this example, we are specifying the offset
as two, using the lead, and with that we are
telling a scale, go jump to the next two
rows and get me the value. And if you're using lag, it means you're telling a scale. Go back two rows up
and get me the value. So here you are telling a scale how many rows it needs to jump. And if you don't specify
anything like leave it empty, que going to go and use one. So the default of
this the offsets is going to be one,
if you don't specify. All right. Moving on to the last one
and to the third one, this is as well optional. You can go and leave it empty. So here, it is the
default value. Now, what happens with
those functions that? Sometimes scale jump to the next two rows or
something like that, and skull doesn't find anything. So there is no more rows
available to access. And with that, k going
to go and return a null. That means if q goes
to the next rows or go to the previous rows
and doesn't find anything, k as a default, going to go and return a null. So if you don't specify
anything over here, in those scenarios,
you will have a null values as a return
from the whole function. But in some scenarios, you don't want to have a null. You would like to have a value. So here you are defining
the default value. So it should not be a
null. It should be a ten. So Scale, if you
don't find anything, return a ten. Don't
return a null. So again, guys, the default
values, the offsets, all those informations
are optional for you in order
to configure it, but you should know the default if you don't use anything, for the offset it's
going to be one, for the default value
going to be null. But you must specify
an expression. So here you cannot
leave it empty. All right. So that's all
about the arguments that you can pass to the
lead or lag functions. Then the next stuff are
the standard stuff. So we have the over close. Then we have the partition by. As usual, partition
by is optional. And then to the order by. Those functions, it's
like the rank functions. It requires you
to sort the data. So it is a must
to sort the data. Otherwise, it's care will not
know what is the next row, what are the previous rows. So we have to sort the data. It is required. You cannot skip this, so it
is not optional. Alright, so the syntax
is not crazy, right? We have the usual stuff,
but only we can go and configure the default
value and the offsets. Okay, guys, now we have
a very simple example. We have months and sales, and we're going to go
and understand how the SQL works for both
of the functions, lead and lag side by side. So now in the first example, we are interested in the
sales of the next month. So in order to do that, we're going to use the lead function. So lead, and then we can
specify the argument. It is the sales. We want
the value of sales. And then we define the window
like this order by month. So it's going to be ascending. Now on the right side,
we're going to be interested in the sales
of the previous months. So in order to do that, we're going to use the lag function. So it's going to be
very similar to the gd. We have lag and then the sales, since we are interested
in the sales, and we're going to sort
the data by the month. So now let's see how
Scale going to do it step by step
and side by side. So Sq going to start
with the first. So now let's see how
scale going to process those informations side
by side and row by row. So it's going to start with
the first row over here. What is the next
month of January. It is February, and we are interested in the
sales of this row. So Q going to take the
value from the next row, and we're going to
have the value of ten. So now by looking
through the January, we can see the sales of the next month of
February in the same row. So now let's check the
right side over here. Now, we are interested
in the previous month. So what is the previous
months of the first row? It will be nothing, right? So we cannot point
it with anything. That's why squeal going
to say, this is null. There is no previous month for the current row, and we're
going to have it as. Okay, so now going to go to the next row. We
are at February. What is the next month,
it's going to be March, and it's going to point to it. So we will get 30 as the sales of the
next month of March. And on the right side, what is the previous
month of February? It's going to be January, right? So it's going to get the value, the sales of the previous month. And here we will get 20. So as you can see
it is very simple. On the lead, we are always
checking the next values. On the leg, we are always
checking the previous value. Let's keep going. We
are currently at March. What is the next month? It's going to be April. Sq going to go and point to it like this. And we will get the sales
of the next month April. For the March on the right side, what is the previous month, it is February, right? So I go to go and
point to February. So we will get the sales of ten. Now, interesting to the
last row over here, you can see that
we are at April. What is the next month of April? There is nothing because we are at the end of
our table, right. So since there's no
month after dance, we will get a null
in the output. But for the lag, we still have a previous
months for April. So what is the previous months, it is March, and we
will get the sales of the march. So it's going to
be 30. So that's it, guys. It's really simple
rights. It's just like they are doing
the opposite things. So now, if you check those
values side by side, you can see that with the lead, we will always get a
value for the first row. But for the last row, it can be always
empty because there is no next value we are
at the end of the table. But if you check the lag, For the first value, we will always get a
null because there is no previous value or previous
record from the first row. And for the last
record, as you can see, we're always going
to get a value because we will have
a previous value. Okay, let's move on in order
to understand how scale this time works with the
offsets and the default value. So now we have the same data,
but we have different task. So now on the left
side, we would like to get the sales of
two months ahead. So it's not the next month. It's gonna be two months. And we would like to tell QL, if you don't find any
value, don't return null. Return for us is zero. So this is going
to be our default. Now, if you check the syntax, it's going to be
exact like before, but we are adding
now an offset of two because we are interested
in two months ahead. And we are specifying
a default value zero. So if you don't find
anything, put zero. Don't put null. Now,
on the right side, we have the exact opposite. We are interested
in the sales of two months ago. So we are not interested in the
direct previous month. We need the sales
of two months ago. And here, the same thing. If you don't find
anything, don't return I'll give us a zero. So you can see, we have the same syntax,
but using the function lag. So now let's
understand how l can execute this step by
step and side by side. So is kal going to start when
the first month, January. So now K going to ask, what is the sales of
two months ahead. So we are at January. It will not be February. It's going to be
the month of March. So it's going to go and
point it like this, and we will get the value of 30 30 is the sales of
two months ahead. Now on the right side, we
are as well in January. Esq going to ask the question, what is the sales
of two months ago? So we don't have any
previous data right. So we will not get anything. Q going to return null, but it's going to check, do we have a default
value? Well, yes. This time, EQ will
not return null. It can return the default value, and this time it's
going to be zero. All right. Now let's
go to the next value. We are currently at February. What is the sales of
two months ahead? It will not be March.
It's going to be April. So it's going to go and
point it like this. And we will get
the value of five. So now on the right side, we
are currently at February. Now the question is,
what is the sales of two months ago? We have history. We have the previous
month, but we don't have two months
in the history. That's why we will still get zero at the output
with a default value. Okay, so now let's keep
going to the next value. We are currently at March. Quel can ask what is the sales
of the two months ahead. We have only one
month after that, but we don't have two months. That's why Equal will
not find anything, and it's going to return null. But it's going to go
and use the default. So here we're going to go
and get the value of zero. There is no more data
available in the table. But now on the right side, we are currently at March, and we are asking what are
the sales of two months ago. So now we have enough
history in the past, and it's going to
get the value off. T. All right. So now let's go to the last month over here
in our table, April. What is the sales of
two months ahead? We don't have any data, so
it's going to be zero as well. But now on the right side,
we are currently at April. What is sales of two months ago? We have enough history. That's why I Cul gonna get
and point it like this. So we will get that
February gonna be ten. That's it. This is how qual
works with the lead and LG using offsets and
as well default value. Let's go back in quel in order to practice
those two functions.
245. 6 3 win value MoM: Okay. So now we have the
following task, and it says, analyze the month over month
performance by finding the percentage change in sales between the current
and the previous month. So that means we have
to go and compare the current month with
the previous month. So the main use case
for the lead and LG is to do comparison analysis, and we have a very
common use case. It's called time
series analysis. So it is the method of
analyzing our business, our data in order to understand the patterns
and trends over the time. And one of the most important
and classical question that you're going to get
from the decision makers or business is to do year over year analysis
or month over month. Analysis. So the year over year analysis is going
to help us in order to understand the overall
growth or decline in the performance of our business over the years over the time. But in the other hand, we have
month over month analysis in order to do short
term trend analysis, and as we'll discover the
patterns in the seasonality. So the main focus
is to understand the performance of our
business over the time. So now let's go back to it scale in order to
solve the task. Okay, guys. So now let's
go and do it step by step. Now, what is the first step? Before we go and compare
things together, we have to collect the data. We have to do the
calculations first. So we have to find out first the total sales for
the current month, and then the total sales
for the previous month. And after that, we can
go and compare them. So now let's start
with the easy stuff. We have to find out
the current sales for the current month. So in order to do that, let's
just do very simple select. So what do we need?
Let's take the order ID. Let's take the order date because inside it,
we have the month. Let's go and collect the sales. So that's it for now
from sales orders. So let's go and execute this. So on the result, we
got the usual stuff. We have ten orders,
sales and order date, but the order date is on
the level of the days, and we are not interested
on the whole date. We would like to get
only the month in order to calculate the total
sales for the month. Now we're going to go and
use a function in order to extract the month from a
date. Don't worry about it. We can have a dedicated
chapter in order. To show you how to deal with
the dates formats in scale. So now, what we're going to do, we will use a very
simple function called month and order dates. And let's call it order
month. That's it. Let's go and execute it. Now, as you can see,
we've got a new field where we have only the
month of formations. So here we have January,
February and March. So now the next step is
that we want to find the total sales for each month. So what we're going to do, we're going to go a new group by. So let's do that.
We're going to go and say we want
the sum of sales. I'm just going to call
it current month sales. And let's go and get rid
of all those informations. We're going to go and
group by the month right. So group by, let's have
the month. That's it. Let's go and execute it. So it's very simple right. We got now the three months and the total sales of
the current month. So now with that, we got
the first information that we need in order
to do the comparison. We have for each
role the total sales for the current month. Now the next thing that
we're going to do is to find out the total sales for
the previous month, side by side in the same row. And in order to do
that, we have learned, we can go and use
the g function. So we're going to
go and integrate the lag window function
in the same group by. So we're going to
do it like this. So lag we are now interested
in the previous month. So that's why we're going
to go and get the sum of sales as an
expression inside it. And after that, we're going
to define the window. It can be like this over
and order by is a must, so we're going to go
and sort the data by the month. Let's go and do it. And with that we've defined
the previous month sales. You are the previous
month sales. So now let's go and execute it in order to see the results. All right. So now let's check
the results. The first row. What is the previous month.
There is no previous months. We are at the first record,
and the first month. That's why we have Null.
Now, let's go to February. What is the sales of the previous month from
January? It is 105. So this is correct. And now to the last value to the March, what is the sales of
February? The previous month? It is 195. So with that we got the two information we have the current month and as
well the previous month. So, guys, as you can see, it's magic, right? It's very simple. We can go and use the lead and lag functions in order to access another values from another rose without doing any complicated
joints and so on. Okay, so now, what
is the next step? We're going to go and subtract the total sales from
the current month with the previous month. So in order to do that, we're going to go and use
a subquery like this. So select star from. We're going to have it
like this as subquery. And now the calculation
is very simple. Let me just move this
little bit down. So it is the current month substracted from
the previous month. Let's go and call it
month over month change. So that's it. Let's
go and execute this. So now let's go and check the results for the first
month, you can see that. We don't have any value,
and that is correct because the previous
month is empty, so there is no change. Now moving on to the February, you can see over
here, we got plus 90. That means we have improvement in the performance of our sales. Now moving on to the last
one, it's really bad. We have decline in
our performance. We can see that we have -115. So that means the
current month is doing really bad compared to
the previous month. So the March is
really bad month. Okay. So now as you
can see in the output, we got the absolute numbers, but the task says, find
the percentage change. So we have to compare
this to a percentage, and we can do it like this. It's very simple. Let's
do it in a new column. Just go to zoom
out a little bit. So. It's going to be the change, the differences divided by
the previous months sales. And then let's go and
multiply it is 100. In order to get the
percentage, like this. And now as you can
see we got zeros, and that's because those
numbers are integer. So we have to go and cast
one of those values. Just going to do
it for the first, so cast, and you are float. So that sets. Let's go
and execute it again. Now, the result looks better. We have the percentages,
but we have a lot of dsymbls let's go
around the number two. Let's say one dymbol only one. Let's give it a
name. So now you are month over month percentage. So let's execute. So now you can see
things get better, and with that, we've calculated, the percentage change in sales between the current and
the previous months. And this is how we do
month over month analyses.
246. 6 4 win value customer retention: Alright, so now we
have another use case for the lead and LLC function. We can use them in order to do customer retention analysis. It's all about measuring the customer behavior
and loyalty. So we are helping
the business and decision makers to build strong relationship with
the loyal customers and for them as well to
focus on their needs. So now let's see how we can use lead and LC function
in order to do. Customer retention
analysis. So, let's go. Alright, now we have the
following task, and it says, in order to analyze
customer loyalty, rank customers based on the average days
between the orders. So there's a lot of things
going on over here. Let's do it step by step. And I would like always to start with a very simple select. So let's go select informations
like the order ID. Let's get the customer ID. And as well, since
we want the days, we would like to have the date. So order date from the
table, sales orders, and let's go and sort
the data on order by customarily and order dates. So the assets, let's
go and execute. So now, as usual, we
got our ten orders, the customers, and
when they did order. So now let's check the task. Let's solve this over here. Days between the orders. So we have to find how many
days are between two orders. For example, if we check the customer number
one over here, he did order around ten January. And the second order is like
after ten days, 20 January. So we have to go and
subtract those two dates. Now, in order to subtract those informations
and do calculations, we have to have everything
in the same row. So, for example, if we are
at the first row over here, I would like to have
as well one column about the next order, so the date of the next order. So we have to access a
value from another row. Of course, we can
go and do joins, but we have lead
and lag functions. And for this scenario,
we're going to go and use the lead window function.
So let's go and do that. I'm going to go and
call the order date over here as a current order, and let's go and
calculate the lead. I would like to get
the next order date. I would like to get this value over here in the same role. That's why this time, we're going to get
the order date. Now let's go and
define the window. Now, we have to go and
partition the data because we are analyzing each customer's
separately, right? So that's why we
have to partition that by the customer ID. Of course, in order
to do the lead, we have to use the order by. Let's go and define
that as well. Oder by, and it's going
to be by the order date. So now, we have to
give it a name. The order date here
is the current order. This is going to be the
next order. So next. Oder, Let's me zoom out a little bit and
make this smaller. So let's go and execute it. So as you can see in the output, we got a new column
called next order. And with that, we got
the current order, the current row, and as well
the value from the next row. So what is the next row? It's
going to be the 20 January. The same thing, of course,
for the next row over here, we have the current order date. And the next order date. So this value going
to be exactly as the next one over
here, 15 of February. And then, since we are
working with window, this is the whole
window over here, The last order for
this customer, it's 15 of the February. There is no next order. This can be. The same thing. If you check the
other customers, you can see always
the last order don't have any next order. So looks like
everything is fine. And for the last customer, it has only one order. Now with this, we got
all the information for our calculations. We have the current order and the next order in the same row. Now we can go and
subtract them in order to get the days between
those two orders. Now, in order to subtract date, we have to use the
function date dip. Don't worry about
those functions. We can explain all those
stuff in the next chapters. Now, just follow me
with those steps. What we're going to do, we're
going to go and subtract this order date with the
whole thing over here. The whole thing here
is the next order. Let's do it in a new line. And it's going to
be very simple. So date D we are finding the differences
between two dates. So the syntax going
to be like this. First, we have to define
what we are talking about, are they days, months,
years, and so on. So we have to tell SQL. Find me the differences in days. Now we have to specify two days. The first one going
to be the order date. This is the current date, and the second date going to be the whole thing from here. Let's take it and
put it side by side. And this calculation going
to give us a number of days. We're going to call this
days until next order. All right. So now let's go
and execute the whole thing. So now let's check the result as you can see over
here. We got ten. So this is ten days
between those two dates, and the next one, we
have around 26 days. Here we have a null because
we don't have here a date. And for the next one,
we have 31 days, so we have a whole
month over here. So everything is
working perfectly. And with that, we have solved, Only this part, days
between the orders. So, guys, you see,
right? This is the magic of the
lead ag function. We can very easily access
any information you need in the same role in order to do such a important analysis. And with very simple query, we are not doing any crazy
stuff like joining and stuff. We are just specifying
the lead function. So long we got all the
information that we need, next, we're going to go and calculate the average of those days. So in order to do
that, we have to go and use a sub query. So let me just zoom out. So let's go and select star. Just prepare the subquery. So the whole thing
can be a subquery. I'm just get rid
of the order by. It's not now necessary. So lets me just put it like this and shift it. So
now, what do we need? We need the average of the day. So we need the average
of this value. So what can we do? We
can go and use a group. So customer ID, since we
have to find the average for each customers
and we're going to get this value and say average. Days until the next order, and we're going to
call it average days. And we have here to group,
group, customer ID. So like this, make this a little bit
smaller and zoom in here. So that's it. Now
we are just doing a very simple average
and group statement. So let's go and execute it. You can see, scale can go
and aggregate the data. So we have now only
four customers. And for each customer, we have the average days
between the orders. So now what is
missing in our task? If you check over here, it says, rank the customers
based on this average. So we have to go and
use the rank function. So here, again, another window function that
we have to go and use, we're going to do it
together with the group I. So let me just make this
a little bit smaller. And then let's do it over here. So I'm just going to go
with the rank function. Then we're going
to defy the window like this over order by, and then we're going
to go and sort the data by the average days. So that means we're
going to go and get this calculation over here
and put it as order by, it's going to be ascending,
so we are focusing on the lowest average
days. So that's it. Let's call it rank average. Now, let's go and execute this. Now by checking the result, you can see now we have a
ranking for the average. And here Scale says that the number one customer or the number one loyal customer is the customer number four, which is naturally correct
because the number four, we don't have a lot of
information about this customer. He or she did order only once. Either now you go and filter the data and remove this
customer, where you say, if the average is null, then
don't put it in the rank or we can go and replace this value with a
very huge value. In order to make it at
the end of our list. For example, we can go
over here and replace the null with Kuaisk like this, and we say, if the
average is null, then let's say, give me a crazy number like
this, very huge one. So that's it. Let's
go and execute. Now, as you can
see, this customer is going to be at
the end of our list, and now we can see that the most loyal customer
is number one, and then the other two
customers are in the rank two. Here we are sharing
the same rank since we have the same average. So guys with that, we
have sold the task, and we have ranked the customers pace on the average days. Between the orders, so we
have now a really nice rank, and we can understand now the
behavior of the customers, and maybe we have
to go and focus on the customer number one and understand here or share needs. And of course, the function that helped us here in order to do such a customer
retention analysis is the lead function in order to find the next order to
calculate the days. So this is how you use lead functions to do such a use case.
247. 6 5 win value first last: The first value and the
last value functions. I think the name says
everything, right. So the first value can allow
you to access a value from the first row within a window where the last value
is exactly the opposite, it can allow you
to access a value from the last row
within a window. Es right. So now
let's understand how SQL execute those functions. So now, as usual, we have
this very simple example, we have the months and sales, and we have it twice
because we would like now to go and
compare side by side, the two functions, first
value and last value. So now for the left side, we would like to get the
sales of the first month. And on the right side,
we would like to get the sales of the last month. So now for the first task, we can go and use the first
value. It's very simple. So the first value function, then the argument going to be sales since we want the sales. And then the window
going to be defined like this order by month because we want to
get the first month. So, as usual, we
must use order by. Now, on the right side, in order to get the sales
of the last month, we can go and use the
last value right. So the same things, lost value
sales over order by mouth. So as you can see on
the left and right, we don't use any
frame definition, but the default going to be
used from this. All right. Now, let's see how
quel going to process both of those queries
side by side. So the first step que can go and sort the data. They
are already sorted. From the lowest to the highest, and then the next step
is going to start row by row finding the first
value on the left side. So what is the
unbounded proceeding? It's going to be static and
always pointing to January. So this is always going to
be the unbounded proceeding. We have it in both
sides like this. And what is the current row? It's going to be at the
start at the first row. And on the right side, the
same things over here. So the window
definition going to be is only one row right. So what is the first value on this window? It is 20, right. The same things on
the right side? What is the last
value in this window? It is as well 20. So we will
get exactly same results. Now, let's move to
the second row. So it's going to be
pointing to February, and the frame definition going to be here extended like this. So what is the first
value in this frame? It's going to be as well 20. So the output, we're
going to get to 20. Now on the right side, the current raw going
to be as well pointing to February and the window
going to go get extended. So now what is the last value of this frame? It's
going to be ten. Now, let's keep going.
We're going to go to the march and the window
going to get extended. What is the first value? It's always going
to be the same. 20. On the right sides, window going to get extended. What is the last value?
It's going to be 30. So as you can see, the default definition is
always having the static start, always the same
start of the subset. And as we are moving
with the current row, the frame going to get extended. So now moving to the
last one, and with that, we will get the whole
data set inside the frame and the first cell is going
to be 20 on the right side, the same things going to
get extended like this, and this time, the last
one going to be April So now if you go and
compare them side by side, you see that on the left side, the task is solved and everything is working
correctly, right? So we have for each row, always the sales
of the first row. And what is the first
row, it is January. So we have everywhere,
e 20, which is correct. But now, if you check
the right side, you can see there is
something wrong, right. We are getting not
the last value. We should always get apt we should have here
everywhere, five. So we have here exactly the
same result as the sales. So it's really useless to
use it like this right. And that's, of course,
because scale is using the default definition
of the window frame. Last value is the
only function from all window functions that you cannot use the default
frame definition. You have to go and customize the frame definition in order to get the effect of
the last value. For the first value,
everything is working. If you're using a default frame, if you're not
specifying anything. But for the last value, you will not get the
effect correctly without customizing. The frame window. So my friends, you can go and use the
first value function like all other window functions. Without defining a frame, you can go with the default, and you will get the
effect of the first value. But the last value, you have
to go and define a frame. So let's see how
we can solve that. Alright, now in
order to solve this, we're going to define
the frame like this. It's going to be
the rows between the current row and the
unbounded following. So we just switch things around. So now let's see
how this can work. Now, of course, SQL going to go and solve the data and so on. Now squel going
to have a pointer to the unbounded following. So it's going to point always to the last row in our dataset. Then it's going to
proceed step by step. So the first row going
to be like this, and the frame going to be
the whole thing, right? So from the current row until
the unbounded following. So what is the last value? The last row, it's going
to be the five, the appl. So we'll get in the output five. Now, let's proceed
to the next value. The frame going to be
shorter and smaller. And what is the last value? It's going to be as
well, the five, right? So now we jump to the next one. And the frame going
to be like this. What is the last
value as well five, and then we will get the
last value like this. Current raw is equal to
the unbounded following. We have only one raw and it's
going to be as well five. So as you can see
is very simple, fix the frame clause, and you will get the last
value working as expected. So this is how Sq is
going to go and do it. Now, let's go back to a
Squal and start practicing. Alright, now we have
the following task. It says, find the lowest and highest
sales for each product. So now let's see
how we can do this. As usual, we're going to start with very simple
select statements. So select order ID. We need the product ID, And as well there sales. So let's select the
table, sales orders. That's it. Let's go
and select this. Now in the output, we got our orders, products, and sales. So now let's start with the
first part of the task. Find the lowest sales
for each product. In order to do that, we can
use the first value function. Let's go and do
that, first value. Then what we are talking about, we have to give an expression. We need the lowest
and highest sales. So let's go and have
the sales inside it. And now we have to define
the windows or over. Since we are saying
for each product, that means we have to
go and make windows. So we have to divide
the data using partition by product ID, and then we must
use an order by. So we have to go and sort
the data by the sales. Since the first value
should be the lowest value, we have to do it
as ascending from the lowest sales to
the highest sales. So we're just going to leave
it like this as a default, and we're going to
call it lowest sales. Let's go and execute this. Now let's go and
check our results. First, skill going to go and partition the data
by the product ID. So as you can see,
we got now here, four windows, then sort
the data by the sales. So the data are sorted from the lowest to the highest 10-90. Now, what is the first
value of the sales? It is the first row right.
So it's going to be ten. That's why we have
everywhere ten. Let's check another one,
let's take this one here. So this window has two rows, and it is sorted the lowest sales or let's
say the first value is, 25. So with that, we have solved
the first part of the task, finding the lowest
sales for each product. Let's go to the next
one. We have to find out the highest
sales for each product. So let's go and use the
last value for this. So let's have a new line. We're going to have last value. Again, the sales. Then we're going to go and
define the window. It's going to be the
exact same window, we have to partition the data by the product ID and order
the data by sales. Let's go and just carry.
The previous one. Let's call it for
now highest sales. Let's go and execute it. Now if you check the results, you will see our issue
over here again. We are not getting the highest
sales for this window. The highest sales is
90, but as you can see, we are getting the
exact same sales, and we have explained that
in the previous example. In order to fix this, we're going to go and
add for it the frame. Rows between current row, and the unbounded following. Now, let's go and execute this. Now let's check the result. As you can see over here, we got the highest sales correctly. For this window,
the highest one is 90 as well for this
window, the 60, and so on. With that, you have
solved both of the tasks, the lowest and the
highest sales. But now, I would
like to show you my honest opinion
about the tasks. I will not go and use the last value to find
the highest sales. Let me show you how
I usually do it. I'm going to go and
use the first value in order to find the last value. Now let me show you what I mean. Let's go and add a new row. I will just take the whole
thing from the lowest sales. But what I'm going to do, I'm just going to go and
change the order. So that means we will not
go and sort the data like this ascending from the lower
cells to the highest seals. We're going to go and switch it. So we're going to go
and sort the data from the highest cells
to the lowest cells. And with that, the first value going to be the highest cells. So let me just rename it. Highest sales, give it like two. Let's go and execute this. Now you can see over here, we
got the exact same results because we sort the data differently and we
get the first value. This can give you the
exact same effect like the last value. As you can see, I don't have to define now any window
or something like that. I can stick with
the default frame but just twisting the order by. This is how you
can do it as well, using only the first value. Now, just for the
sake of this task, there's as well another
possibility in how to solve this. You can go and use
the minmax functions. Let me just take the same
avenue one, the lowest sales. We can go and say,
You know what? Let's get the men. We are saying find me
the minimum sales, and we don't have to
go and sort anything, so we can go and just
divide it like this. Let's give it another ID. Let's go and execute it. As you can see, we got the exact same results like
the other two higher sales. So as you can see, we can solve this task using three
different functions. Either go and use
the last value, but you have to define the
frame or you can go and use the first value where you
switch or flip the order by. Or simply just using the max function in order
to get the highest sales. So, guys, as you can see, we
can use the first value and the last value in order to find out the extremes like
here in this example, the lowest and the
highest sales. So there is like
similarity between those two functions and as
well, the mean and max. Of course, what
we're going to do with this value over here, we can go and compare it
with the current sales. So for example, we can go and extend our task where we say, find the difference in sales between the current
and the lowest sales. So in order to do that, let me just clean up
all those stuff. Let's stick with
the first value. And the highest value like this. So we have to compare
now the current sales, which is this field over here, the sales, the original one, with the lower sales with
the whole thing from here. So let's go and do that. So we're going to
have a new line, and we're going to say, simply subtract the sales
from the lowest sales, like this, and let's give
it a name sales difference. So that says, Let's
go and execute it. Now, as you can see the
results in one row, I'm comparing the current sales, which is 90 with the lowest
sales from this product. It's going to be
the ten. So with that we're going to
get the distance, let's say, between
those two informations, and it's going to be 80. So now for the next one, the distance between this value and the lowest value is shorter, so we are near the lowest value. So as you can see over here, we can now compare the sales
between the current sales and one extreme in order to find the distances
between two values. So this is again, very important analysis in order to do
comparison analyses.
248. 6 6 win value suzmmary: All right, friends. So now
let's do a quick recap about the value functions or we call them sometimes
analytical functions. So what they do, they're
going to go and allow you to access a specific value
from another row. This can help you in order to do complex calculations with very simple SQL without having you joining tables together
or doing self joins. And for the value functions, we have four types or
let's say for functions. The first one allows
you to access the previous value like the previous month
using the lag function. The next one, it allows you
to access the next values, the next month, using the lead function. Then
we have another one. It allows you to
access the first value in a subset using the
first value function. And another option,
we can go and access the last
value in a subset, using the last value function. Moving on to the next one, we have the rules of the syntax. So A the first point, it is the expressions. We can go and use any data type. It could be a number,
string, date, anything. Now, in order to perform
those functions, we have to go and sort
the data by the order by. So order by is
required. It is a must. Then for the frame, you
are allowed to use it, so it is an optional thing. I would say always leave
it empty for the frame, but only for the last value, you have to go and customize. Otherwise, it will not work. Now, to the next point,
we have the use cases. We have simply very
important use cases for the value functions
in data analytics. So what we can do, we can
do time series analysis. As we learned, we can do
month over month analyses and year over year analyses. Hose analyses are classical, and it's always the
first question and that analysis in
order to measure, are we growing with the
business or are we declining how the performance between the current year and
the previous year? So you can see we
are doing always comparison using those
window functions. The next use case is as well
about the time we can do time gap analysis as we analyzed the customer behavior,
the customer retention. Where we have calculated
the average days between two orders. In the last use
case, it's as well about comparison
comparison analysis. We can go and use the
value functions in order to compare the
current value with extreme, like comparing the
current sales with the highest sales or
to the lowest sales. So my friends, those analyses are essential in data analyses. You will be countering
them in each company. In each business, you have
to answer those questions, and you can do that very easily using the SQL window functions.
249. 8 1 intro case : Friends, now we're going
to learn how to build a conditional logic in SQL
using the case statement. And we're going to
start with the basics likes understanding
how they work, the syntax, and how QL execute the case statement
behind the scenes. And after that, I'm going
to show you many use cases for the case statements
that I use in my projects. So now let's start with
the first question. What is case statement? Case statements, it
can allow you to build a conditional logic
in your SQL query by evaluating a list of
conditions one by one and return a value when the
first condition is met. So now let's understand
the syntax of the case statements
and what this means.
250. 8 2 syntax case: Ooh. Now let's see the
syna step by step. It's start with
the keyword case. This case indicates now
we are starting logic, a conditional logic in SQL. It's like programming languages
as you start with the Fl, the F is the keyword of logic. The whole logic as well
ends with another keyword called once SQL sees the end, so this is the end of
the conditional logic. The case is the start
and the end is the end. Now what we can have in between
is the conditional logic. The conditional logic
start with the keyword. Now we are telling SQL, we have a condition
to be evaluated, and then we're going to go and specify the conditional logic. We have to tell SQL, what can happen if this
condition is fulfilled. Now we have to use another
keyword code then. Now we are telling a SQL, show these results if
the condition is true. As you can see,
it's very simple. It's like the natural
language, like in English. When the condition one is met, then show the results.
It's very logic. Now of course, we can go and add a second condition inside
our case statements. We can have the same set up. When condition two,
if this is true, then show the result number two. We specify the keyword when, then we have a second condition, and if this condition is true, We tell SQL to show
another results. Of course, it's
very important to understand and the
syntax of dots, SQL going to go and process the conditions from
the top to the bottom. So the first most
important condition should be at the start. SQL going to first
check this condition. If it fails and it's not true, then it's going to go and
jump to the second condition. The order of the conditions is very important in your logic. Now of course we can go and add multiple conditions
depend on the logic. Using the keyword
when. And now once we are done defining
all the conditions, we can go and specify
an else keyword. The else can introduce the default value,
and it is optional. You can go and skip it. So the value of the ils or
the default going to be used only if all the
conditions failed. So that's means
all our conditions are not true and
nothing is fulfilled, then Q going to go and use
the value from the else. So it is the default
value that's going to be used if all
conditions are false. So those are the
keywords that you must use inside each
case statement, so we have case,
win, then, and end. Only the else is an optional, so you can go and
use it or skip it. This is the main structure and the syntax of each
case statement.
251. 8 3 howitworks: Now, let's have a
very simple example in order to understand how is SQL execute the case
statements behind the scenes. All right. Let's have
this very simple example where we have only
one condition. So as you can see in the syntax, it starts with case and end, and then we have
only one condition, and we are evaluating
here the sales. The condition says if the
sales is higher than 50, then show at the result
the value of high. It's very simple
only one condition, and on the right side, we have here a flow
chart in order to understand how the
logic is executed. Now, what we're going to
do, we're going to go and evaluate those four sales through this logic and see what the outtu going to be
with the case statement. Let's do it one by one. Let's start with the
fair sales. It is 60. So here we're going
to go and check is 60 higher than 50? Well, yes. That means the sales is
meeting this condition, and we will get true, and we're going to
get in the output, the value of high. Here we're going to get the
value high in the output. That means the first sales is
fulfilling the requirement, the condition, and EQ going to give us the value
from this condition. All right. Now EQ going to go
to the next value, and we're going to start
evaluating the 30. Now we're going to ask
the same question, the same condition is 30
higher than 50. Well, no. That means in the output
for this condition, we will get false, so we will take the
bath of the false. Now, if you take the
bath of the false, we will not get any value right, that's means the output
going to be a null. So the output for
the 30 is null. And that's because
we didn't define in our logic anything about
the default option. So we don't have here an else. And this is what going to
happen if you don't use els, you will get a null in the
output for the case statement. Now let's move to the next one. It's going to be the same thing. So 15 is smaller than 50, so it's not fulfilling
the condition, and as well, we're
going to get a null. And for the last one,
since it's null, we will get as well a null, since it will not
fulfill the condition. Now after evaluating
all those sales, Only the first sales is
fulfilling that condition, and that's why we have
only one value the high. All right. So now
let's keep moving and adding stuff to our
case statements. Now we are adding a
second condition. It says, after
checking the sales, whether it's higher
than 50 and it fails, check again the sales,
whether it's higher than 20. If yes, then show
the value of medium. Now in our workflow, we are adding a second
condition to be checked. If the first one is false. Now let's go and evaluate our sales again and
check the output. The first one, the 60. As you can see, the
60 is higher than 50, so we are fulfilling
the first requirement. That's why we will get
the value off high, it seemed like before.
Here we're going to get. I in the output. Now here very important to
understand one thing that. SQL didn't evaluate here in this scenario, the
second condition. SQL didn't waste any time by checking the other condition. It skeped everything once it get a true from
one condition. This is exactly how
SQL process the case. It's going to check each
conditions from top to down, and once it finds it true, it's going to stop
everything immediately and show the value
from this condition, and it will not evaluate
any other conditions. Scale going to go and
jump to the next value. We are the value of 30. Let's evaluate the conditions
is 30 higher than 50, well, it's not, so it's
false. Now what can happen? Ice going to go and jump to the next condition and start
evaluating the second one, whether it's true or false. Now we're going to check
here is 30 higher than 20. Well, yes. It can be fulfilled and we will get the
value of medium. C going to stop
everything and show in the output for this value. The medium, so we're
going to get medium here. In this scenario,
we have evaluated both of the conditions that we have in the
case statement. Now it's going to go
to the third one, we have 15, is 15 higher
than 50, will know. We will get the faults
for the first condition. Then we're going
to go and jump to the second condition and check it is 15 higher than
20, will as well know. Now what's going to happen?
The faults going to be a here and we will not get
any value as a return. We will get the value
of null in the output. Now for the last
one, we have null, we will get as well null
because it will not fulfill any of those conditions, and that's because
we didn't define an else in the case statement. If we define these
conditions like this, we will get the category
medium for the 30. This is how Scale evaluate multiple conditions in
the case statements. Right now, we're going
to go to the final form of our case statements, and we're going to
go and add an else, we're going to have
a default value. We are seeing here
if the sales is not higher than 50 or
higher than 20, then show a default
value as low. That means any sale
that is equal or smaller than 20 going to
get the value of low. Now very interesting
if you check the workflow over here,
you can see that. We have now a value
for each path. For the first condition,
we're going to get high for the second one medium, and if nothing is fulfilled, we're going to get always
the value of flow. So there is no way in this
chart to get any nulls right. So let's go and evaluate again our values. I think
you already get it. The 60 is fulfilling
the first requirement, and SQL going to stub everything immediately and just
show the value of high. So on the right side over here, nothing going to be evaluated because the first
condition is true. Here in the outsots, we're going to get the value of high. On nothing changed like
the two previous examples. Now, Scale going to
go to the next value, we have the 30, so we can evaluate the first one.
It's going to be false. The next one, it's higher
than 20, it is true, and that's why Scale going
to show the value of medium, and this is as
well, we had it in the previous example. Medium Now, is C going to
move to the next one and here things going
to get interesting. The value of 15. We're going to evaluate the first condition, is it higher than 50? Well know, Is it higher than 20? Well know. Now we
are in scenario where none of those
conditions are true. That's why Q going to go
and execute the else. If you check our chart, it's going to be false and we
will get the value of low. So in the outputs, we
will not get this time. A null, because we have els, we will get the value of flaw. The same thing now for the null. Null will not fulfill the first condition as
well the second condition, and that's why we will get as well the value from the else. So here in the output, we will get as well
the value of flaw. So now, as you can
see, if you use an else inside the
case statements, you will make sure that there will be no nulls in the output. So that you have learned
the different options that we have inside
the case statements, and how Scale execute the
case behind the scenes.
252. 8 4 usecase 1: All right, friend. So now we come to the part where
I'm going to show you the most useful use cases of the case statements that I usually use in my
projects. So let's start. The main purpose of
the case statement is to do data transformations. Data transformations is
a very important process in each data projects. And one very important task in data transformations
are that, we can generate
new informations. We can go and create a
new columns based on the existing data
that we have in the database using
the case statements. This, of course, can help us
driving new informations for our analyzes without modifying
the source database. Only for analytics. My friends, the main purpose of the
case statement is to do data transformations by creating and generating new columns. Now let's start for
the first use case and the most important
and famous one is, we use case statement in
order to categorize the data. This means we are going
to group up the data into different categories based
on certain conditions. Now you might ask why this
use case is important. Well, classifying and grouping data is fundamental
in data analysis and reporting because it makes the data easier to understand
and as well to track. But what's more important, it going to help us aggregating the data based on the
categories. All right. Now let's have the following
task, and it says, generator reports
showing total sales for each of the
following categories. Category high if the sales
is over 50, category medium, if the sales is 20-50, and low if the sales is 20 or less and sort
the categories from the highest
sales to the lowest. Let's do it step by step, and now before we do
any data aggregations, we have to go and create
a new column called categories because we don't
have it in the database. Now let's start with very
simple, select statements. Select what do we need? Let's take the order ID. The sales, and
that's it for now. So from sales orders. Let's go and execute it. And now we have our ten orders, and we have to go and now create a new column
called categories, and we're going to do that
using the case statements. So let's take a new line, and we start with
case, and then again, a new line in order to define the first
condition using the w. So the first condition
is the high where sales is over 50, so
it's very simple. So when the sales
is higher than 50, what can happen if this is true? We want to show the value high. So this is the first condition, and then let's move
to the second one. If the sales is higher than 20, that means it's less than
50 and higher than 20, then we want to see
the value medium. Now for the last
category, the low, we don't have to go and
create a condition for that, because if those two fails, then that means the sales
either equal to 20 or less. What we're going to do,
we're going to just do simple se and show
the value low. Like this, let me make
this a little bit smaller. Now what is missing
in our case is, of course, the end. Without it, you're
going to get an error, end and let's give
it a name category. We are ready. Let's
go and excuse it. Now let's check randomly stuff. As you can see here, we
have the sales of 50. It is low, which is correct, and then we have here 60, it's above 50, and we
have the category high. Now if you check the
order number six, we have the order 50
it's medium because it is not higher than
50. It is 50-20. Now as you can see, we have now classified our orders
using the category. Now the next step with
that, we're going to go and aggregate the data. How
we're going to do that. We will use a subquery.
Let's do it like this. We're going to go and
select, and of course, we're going to group up
the data by the category, so we're going to go
and Lk de category, and we need the total sales that means you're
going to go and use the function
sum for the sales, and we're going to
call it total sales. Now we have to nest the
queries together, F, this is our query like this, and then we have to
close it and group i, So we are grouping
by the category. With that, we are
now aggregating the sales by that category. It's very simple. Let's
go and execute it. Now in the result, we have
only three categories, we don't have the
ten orders because now we are doing
data aggregations. Now the granularity now
on the level of category. Now we can see the total
sales for the high is 2010. The low we have 65 and
the medium we have 105. Of course, we are not done yet because in the task, it says, sort the categories from the
highest sales to the lowest. That's means we
have to go and use an order by statement
at the end, and we're going to
sort the data by the sales from the
highest to the lowest, that's means sending
so that's it, let's go and execute. Now with that, we
have our reports. Now we are showing
the total sales by the categories and the data sorted from the
highest to the lowest. The highest category is high, then medium, and then
the last one is low. My friends, as you can see,
with the help of the case, we have created new informations from our data, we
have the category, and then we have
created insights or report based on this
new informations, where we have
aggregated our data using this new information. The use case of
categorizing data using case statements is fundamental and very important
in each data project.
253. 8 5 Rules: Okay. So now, one
more thing before we jump to the next use case, that there is one rule to follow if you are
using case statements. And that is the data types of the result must be matching. So what this means, if we check again our
example over here, we can see that the result
of each condition is string. So as you can see we have
here high, medium, and low, and all of those
informations are following the same data
types, so it is correct. So if I go and break
this rule, for example, After this then, let's
have the value too. So now we have a number, and we have characters. So let's go and execute it. Now, of course, we're
going to get an error because now kel is trying to convert the value low to an
integer which is incorrect. So the data types of the output of the result
must be matching, and that's not only include
the value after the then, but also the value after the else because this value is as well part of the output. So let's have here again medium. Now, let's go and change
this to, let's say one. So let's go and excuse it. Again, scale going
to throw an error because this is an
integer number, and the others are
string characters. So this is the rule of
using the case statement. The data types after then and after else must be matching. And if you ask me whether
there's restriction about where you can use the se
statement in which clauses, you can use it
everywhere in select, in joints, from where, group by, order by, everywhere. So there are no restrictions, and we have only this one rule.
254. 8 6 usecase2: Okay, friends. Another use
case for the case statements, we can use it in
order to map values. We can use the case
statement in order to transform the data
from one form to another in order to make it more readable and more
usable for analytics. One scenario of
mapping values a dots, sometimes the database
developers stores the data and values inside the database
as codes and as flags. So for example, the
status of the order could be stored as one N zero, instead of having
inactive and active, and this is one
technique in order to optimize the performance
of the database for the application
because one and zero is way faster than storing
the whole string. But in data analysis, we usually generate reports to be read by human by persons. Now instead of showing
the data as zero and one, it's going to be more
nicer and readable if you show the data as
active and inactive. For these scenarios,
we're going to go and use the case
statement in order to translate those cryptical
and technical values into readable terms. Otherwise, each one can
consume your report, going to ask you, what do you
mean with the zero and one. Let's have the following
task and it says, retrieve employee details with gender displayed as full text. Now let's go and solve it. First, we're going to go and
explore a few informations. Let's go and show
the employee ID. And let's take the
first name, last name, and we need the gender
information so gender. From sales employees. That sets. Let's
go and excuse it. Now, as you can see
in the results, we've got our five employees, and now the gender informations are stored as only
one character. F and M. Of course, it's easy to
understand that the F is female and M is male, but we would like to show it in the report as a full text, female and male instead
of those abbreviations. In order to do that,
we're going to go and use the case
statement in order to do the mapping between the
old value and the new value. Let's go and create a new
column, using the case. We're going to have
here two conditions because we have two values. Let's start with the first one, so we're going to
have a new line and w. So when the
gender equals to F, is first, then female. Now for the second value, it's going to be
exactly the same. When gender equal to M. Then
we're going to have male. Be careful for the case
sensitivity of the values. Of course, we will not end
this without an else or else. Then we can have
the default value. We can have the default
value, not available. It's better than having nulls. So what we are
missing is the end. So we're going to have
an end over here, and we're going to call
you gender full text. So the set, let's
go and excuse it. Now, if you check the results, we have now done
the mapping between the old format of the
value with the new format. So instead of, we have
males and females. Of course, we don't
have here an nulls. That's why we don't have a
not available in the data, but if you have huge
data, of course, you're going to have
somewhere a null, and then you will get
this default value. This is how you can
do mapping between values very easily using
the case statements. Let's have another task
for the mapping use case, and the task says, retrieve employee details with
abbreviated country code. Sometimes as we are
generating reports, maybe using BRBI or tableau, don't have enough
spaces in order to use the full name of values.
What do we need? We need abbreviations. We need
short form of the values, and we can go and use in CL, the case statement
in order to map the full value to an
abbreviated value. It's like the previous example, by the way around. All right. So now let's go and solve it, we're going to go and
select few details like the customer ID. Let's take the first
name, last name. And what do we need? We need
the country information. From sales customers. That's it. Let's go and execute it. And as you can see, we get
our five customers and we have the country of
formations as a full name. Now, of course, for the reports, we need abbreviated
values from this. So we're going to go and map those full names of the
countries to a short form. But in real project, you
might get pick tables where you have thousands
and millions of records, so you cannot just
check it like this. How I usually do it,
I go and retrieve a distinct list of all
values from one column. I usually go and have a
subate query for that. So we're going to have
select distinct country. From the table sales customers, is just for me to see all the possible values
inside the database. So now you see the
second result over here, we have only two values
Germany and USA, and then I can go and
map the data correctly. Always if you are mapping
data using the case win, you have to understand
all the possible values that you have inside the table. So let's go and generate
this new informations. It's start with case,
and then you line when country equal to the first value, it's
going to be Germany. Make sure you write it
exactly like in the database. The first character is capital, and the rest is small.
So what happened? We're going to have the
abbreviation of Germany. It's going to be TE, right. So this is for the first value, and then let's move
to the second one. It's going to be
country equal to USA. It's already abbreviated, but maybe we can get
only two characters. So US like this. Now let's go and add an else. It's optional, but
in case that we have nulls in the data
we get new value. So else, it's not available. That set and never forget about the end end and the name going to be
country abbreviation. That's it. Let me just get
rid of the other query. The mapping is correct.
Let's go and execute it. Now if you check the results, we got a new column called
country abbreviation, and as you can see now,
the mapping is working. Here we have Germany
and we have here D E, and for the USA, we have US. With that, we have
solve the task and we've done the mapping correctly between old
value and the new value.
255. 8 7 quickform: All right, trans, now
there is a special case for the syntax of
the case statements, if you are using it
for mapping values. Now let's go and tack it. Now let's say that
we have a lot of different distinct values
inside the country, not only to values, we
have a lot of values. If you are mapping the
values using the case, when you're going to end up always writing the same thing, country equal Germany,
country equal India, country equal United
States, and so on. We are always using
the column country. The conditions over here
using always one column, and it's always the
operator is equal. Now only for this scenario, we have another syntax for the case statements,
and it looks like this. We start with a keyword
case, but after that, immediately, we're going to use the column that we
want to evaluate. Here you can use
only one column. You can et use multiple columns. Now we are telling SQL, we are now evaluating
one column, the country. Then for each condition, we
have the following stuff. We say when Germany, that means when country is
equal to Germany, then DE. As you can see here, we don't have here the whole condition. We have only a possible value that you can see
inside the country. We are saying, is
the value country, If it's true then show D E, the next one is it India, then E N, United
State, US, and so on. We call this syntax a quick
form of the case statements, and on the left side, we call it full form of the
case statements. Of course, the restriction and limitation using the
quick format is that, you can use only one column and it's only for
the equal operator. That means only for
these scenarios, you can go and use
the quick format. If things get a little
bit complicated where you have to mix
and make complex logic, you cannot use the quick format. I would say if you are sure
that the logic will not get complicated and you can stay always with
the same column, you can go with
the quick format, but I would recommend
always to go with the full format because
for one s reason, if you add one small logic, you have to go and rewrite
the whole case statements back to the full format in
order to add any small logic. But of course, there
is nothing wrong using the quick form in order to do the case statements if
the logic and stay static. You are sure we are using only one column and we
are just doing mapping, there is no ic. Now let's write this
quick format for the case statement for
the previous example. I will just go and copy
everything to a new column. I'm just going to rename it to two and now how we're
going to do it. It's going to be case, but this time we're going to
write a country, and then inside the wind, we will have only the values, so no need for the condition. It's going to be like
this and we scrolled up. That's it, as you can
see, it's smaller and quicker than writing the
whole condition each time. Now let's go and execute this. As you can see in the result, we're going to get
identical values. Now we know one more trick
in the case statement.
256. 8 8 usecase3: All right. Moving on to the next use
case for the case statements, we can use it in order
to handle nulls. Handling nulls means replace
a null with a value. And as we learned before with the window aggregate functions, sometimes nulls leads to incorrect calculations
and results, which leads to wrong
decision making. We're going to have later
a dedicated chapter on how to handle nulls in sc, but now we're going to
learn how to handle nulls using case statements. So now let's have the
following task and it says Find the average
score of customers and treat nulls as zero
and additionally provide details such as
customer ID and the last name. Okay, now let's solve
it step by step. Again, we have here details, and as well, we have
to do aggregations. That means we have to go and
use the window functions, and we don't have to
forget that we have to treat the nulls, so
we have to handle it. So now let's go and start
with very simple select. Select customer. ID. We need the last name, and as well, we need the scores. So from sales customers. Let's go and execute it. So as usual, we have
our five customers and the scores, and
here we have a null. Now, we're going to go and
write the window function, but without handling the nulls just in order to see
the differences. So we need the average function. For what? For the scores? Do we have to now
partition the data? Well, no, so we're going
to leave it as empty. We need the average
score of all customers. So that set, let's go
and give it a name. And then execute it. Think I have a mistake, so it is a score, not scores. So now, as you can see, we have the average of 625. And as you learned before,
score going to go and summarize all those four
values and divided by four. But our business understand
the nulls as zero, not as missing information. So we have to go and
handle the null. Let's go and create a new
column for the scores. But this time we're going to go and use the case statements. It's going to be very simple,
so we're going to say, When the score is null. So in SQL, we don't
write equal null. We say is null. With that, we are replacing
the nulls with zero. So now, otherwise,
what can happen? If it's not null, so we
need the score as it is. We should not
manipulate anything. So the default value
is the score itself, if the score is not null. Now, let's go and end it. Let's call it score clean. So let's go and execute it. Now, if you check the
result over here, it's like almost
identical as the score. We don't have a new
values for the scores, but only the nulls now are zero. And all other values, they are not affected,
so we didn't touch it, we didn't transform it at all. This is what do we
mean with handling nulls replacing nulls
with another value. Now in order to finish the task, we have to do the average for the score clean and not
for the original score. How are we going to
do it? Let's go and copy the whole case statements. I'm just going to do
it in another column. Let's have an average
and inside it, we have the case
statements like this. Just sort it like this. Now what is missing is the er, and it's going to be empty. Average customer,
let's call it clean. This is the logic. Let me
just make everything smaller. So now as you can see it's
exactly like the previous one, but instead of using
the original score, now we are using the column
that we have created. But of course, we don't need the AS over here, so
we have to remove it. So it's start with case and
so let's go and execute it, and now you can
see in the output, we got a new value
for the average, and it is more accurate
for the business. Now we have 500,
previously, we had 625. As you can see, you have to
understand what the nulls means in your business
and handle it correctly. Otherwise, you will
get wrong results. That's it, we use
case statements in order to handle the
nulls inside our data.
256. 8 9 usecase4: F. Conditional
aggregations means we can go and apply an
aggregate function in a square like some
average count, but this time only
on a subset of data that meet
specific conditions. This technique is amazing in order to do deep dive analysis or target analysis on a
specific subset of the data. So now let's have the
following SQL task in order to understand
this use case. The task says, count how
many times each customer has made an order with sales
greater than 30. All right. As usual, we can do it step
by step. What do we need? We need the orders,
let's get the order ID, and as well, let's
get the customer ID. Like this the sales
from sales orders. Let's go and execute it. So now what else I'm
going to do with that? I'm going to go and order
the data by customer ID. So let's execute it again. Okay. So now the
task sounds easy, but it's a little bit tricky. We have to count the
number of orders for each customer where the
sales is higher than 30. Let's have an example. For example, this
customer number one. So the total number of orders
is three orders, right, but we have to count
only the orders where the sales is
higher than 30. And in this example, we have only one order where
The sales is higherthan 30, so it's only the
order number four. The count for the customer
ID number one should be one. Now, let's check another
customer, for example, the two. As you can see, we
have three orders, but none of them have the
sales higher than 30. So the count should be zero here. How are
we going to do that? We have to go and flag each row whether it's
higher than 30 or not. If it's higher than 30, it gets the flag of one. If it's less than 30 or equal to 30, it's
going to get zero, and then we're going
to go and summarize all those flags in
order to get the count. So let's do it step by step. Let's first create the flag. So we're going to
go and use case, and then our condition
is very easy. We're going to say when?
What is the condition? Sales greater than 30. Sales is higher than 30. Then what can happen? We're going to flag
it with the one? Because later we're going to
go and summarize the one. Now, else, if it's
not higher than 30, equal to 30 or less, so it's going to get zero. All right. So now
let's go and end it. So let's say sales flag. Now let's go and execute
it and check the results. So now, if you
check the results, we got now a very nice
flag in order to see which orders has
sales higher than 30? Now, for example, let's take
that customer ID number one. As you can see, only
the order number four has sales higher than 30
and it's flagged with one, and all others are zero. Now let's take that
customer ID number three, and as you can see, we have now two orders where the
sales is higher than 30. And as you can see, we
have the one twice. We can use this flag in
order to do the aggregation. Now, if you go and
summarize the flag for the customer ID number
three, we will get two. This is the count of orders where the sales is
higher than 30 right. Let's take another example, the customer ID number two,
we have everywhere zero, and if we summarize those
values, we will get zero, which is the count
of orders where the sales is higher than
30, which is correct. Now as you can see
first, we have built an extra column in order to help us doing the aggregation, and now in the next step, we're going to go and aggregate this column. Let's
go and do that. We don't need all
those informations. The order ID, need
the customer ID because it is the granularity
for the aggregation, and let's remove the order y. Now let's go and drove up
the data by customer ID. But, of course, we need the aggregate function.
How are we going to do it? We're going to go and
summarize the whole flag. Now, of course, we're
going to go rename this since now it is an
aggregated column, so we're going to
call it total orders. Now let's go and execute it. Now let's go and
check the result. As you can see now, we
have our four customers. For the customer ID number one, we can cut only one
order higher than 30. The second one has no
orders, higher than 30. The third, we have two and one. And with that, we
have solve the task. Now I would like to add one
more thing to our query in order to see the
normal aggregations, not the conditional
aggregations. Usually we go and
count For example, the star in order to
get the total orders, and let's rename the
previous one to high sales. So let's go and execute it. So we are just now doing aggregations without
any conditions, and now we can see how many
orders did each customer. So we can see that the
customer ID number one did order three times, but only one order
higher than 30. This is a normal aggregation, and this is a
conditional aggregations using the case statement.
256. 8 10 summary: All right, friends, now let's do a recap about the
case statements. Case statements can go and evaluate a list of
conditions one by one and return value once
the first condition is met. And if we are talking
about the rules of using the case statements, we have only one where
the data types of each condition after the den
and else must be matching. And now, if we talk about the use cases of the
case statements, main use case is to do
data transformations and especially by
creating new columns and driving new informations. As we saw there are amazing use cases for the
case statements. For example, we can use it in order to categorize our data. As we learned, we can go
and create new groups of data then to be aggregated
for our reports. Then we saw another use
case is mapping values. We can use the case statement
in order to help us mapping the cryptical
technical values that is stored in databases, to new values, which is more readable and more
friendly to be used. The next use case that we have learned is handling the nulls. We can use the case statement
in order to replace the nulls with value to make our aggregations
more accurate. The last use case
that we have learned, and I think the most
used one in my project is doing conditional
aggregations, where we can aggregate a
subset of data that meets specific conditions in order to do focus and target analyses. All right, so efficacy, the case statement is
very powerful tool in order to create conditional
logic, and it's amazing. In order to derive and generate new informations
for analysis. And now in the next
chapter, we're going to learn all
the functions and all the techniques on how
to handle nulls in SQL. It's very important to clean up our data before doing
any data analysis.