Transcripts
1. Introduction: Hello. My name is Andrew, and I welcome you to the Master Data
visualization course. In this course, you
learn how to choose the right chart and when
does it make sense, how to build charts,
how to label them, how to annotate, how to
estimate, and how to use color. And I'll also show
you what to avoid, so your data is always honest. You'll learn how
to, for example, declutter a chart
from a mess like this into something
more readable into a well executed design that is easy to explain and use
for Data Storytelling. After this course, you'll never look at charts the same way. You will pay attention to the little things like the aces, how forecasts are showcased, what colors were used, what design choices
were made here, and it is a beautiful process. You don't need to be a designer. You don't need any
advanced tools. You just need to
be ready to learn. If you're ready, it
will be a lot of fun. I'm waiting inside.
Let us start.
2. 01.01 - What is Data Visualization?: Hello and welcome in the
beginning of the course. We have to start somewhere, so let's explain what
is data visualization. If I would just have
one tiny sentence, I would tell you,
data visualization is turning data into meaning. It's not just about charts. The chart itself is
only the tool we use to move an idea
from a spreadsheet, from a table, from
data that we have into someone else's head in a way that makes
him understand it. And this is crucial. You want people to understand
your statement. I like to teach on examples, I like to be practical
with my courses, so let's go over it on
this extreme example. This is a table. This is
a table with raw data. This is Skystream Analytics, a company I've
obviously made up. It's the revenue for 2060
expressed in US dollars. Now, you see raw data and you basically come up
with no conclusion. Unless you know the company and unless you know what
I'm about to say, you basically aren't
able to come up with a more sophisticated
conclusion than just saying, Hey, month A was better
or worse than Month B. If I ask you how
the summer went, then you could start
to look at it. But what do I mean by summer? Do I mean just June and July? Do I mean warm
months of the year? I would need to be a tiny
bit more precise here. Do I mean astrological summer? Now, on the contrary,
previously, we had a table. Now let's do the exact opposite. Now we have just a statement. SkystreamGrowth, accelerated
significantly in the last three
months of the year. This is the conclusion I
wanted you to come up with. But what do I mean by
accelerated significantly? Do I mean 1%, 10%, 30%? This can vary. Okay. Now the same exact data we had in the table and the
same data that I gave you in this statement is put on a line chart because
I thought it will be the most accurate and the most suitable type
of chart to use here. I'm using an action
title showing you revenue search 42% in othqar
following the October pivot. If you read this sentence, you already know what I want
to say with this chart. I want to show you, Hey, we had a slight slump
in the summer months, but later on at the end, look how our revenue exploded. Take a look at the
end of the chart. If I would be the designer, I would probably draw a
chart similar to this. I would draw your attention to the last three
months of the year. I would make some kind
of annotation like this, and I would give you
a brief statement, Skyscrem growth
accelerated significantly in the last three
months of the year. This is what the data
visualization is all about. The key takeaways from this very first lecture
is communication first. The chart is only the
vehicle for the message. Evidence and insight. Data proves the
claim like we had the data inside of the
table, Visuals explain it. This time, we used a line chart to explain the data that
we had in the table. You need a definition, there are many ways to define data visualization in itself, but we could say data
visualization is a functional tool used to reduce the time it takes to
understand information, and this is beautifully put together understand
information. There is one bonus information
I want to give you. Data visualization
is not just charts. I will expand this
in the next lecture. For now, the goal of this
course is completely set. After this course, you
will be able to turn raw numbers into a visual
story, into a narrative. This is the goal we are both working towards
and striving for. So let us continue.
3. 01.02 - When to use charts: Are charts always better. A common mistake in data
visualization is assuming that every piece of data
always needs a chart. In reality, the best visualization
is simply the one that communicates the point fastest
and most understandable. Sometimes, obviously,
this will be a graph, but sometimes just a simple
sentence or even a table. Let's go over that as
always on an example, on an actual example
that you can comprehend. Yesterday, our checkout
success rate was 98%. A sentence for that
seems completely fine. Let's compare the
same data in a table. This type of data doesn't take much sense to be
presented within a table. Now a chart. Look at this comparison. The same information put in
a sentence, table and chart. The table and chart
look almost ridiculous. On the chart, you need to read
the axis, read the title, how big the bar is, the
value, it's overkill. When the data is simple, the sentence is
the winner because it has the lowest
cognitive load for you. Your brain processes
it instantly without having to navigate
through an entire graph, but is a sentence data
visualization? Of course, it is. A sentence, I
presented correctly, sure, you can consider
this data visualization. Now let's go over
the second example. The checkout rate
for desktop was 99%, while mobile was 94, tablets were 91%,
smart watches were 75, and overall, new
users average 88%. As you see, text has
a breaking point. As soon as we have
five or this type of different
categories to compare, a sentence simply fails. It was exhausting, just
reading this for me, let alone you understanding
and comprehending that. This is why in this situation, I would maybe use a table. If you need to see the
precise values like 99%, 94, 91, a table would be great. It's organized and it allows
you for a quick overview. The same on a chart, it
would be just as good. Maybe to make it easier to read, I would put the data
points inside the bars. To recap this entire lecture, depending on the data you have, you might want to
use a sentence, a table, a chart, or
something else, a drawing. As long as it conveys the
information in the most simple, quick, and understandable way, it will be proper
data visualization. Let's now move forward.
4. 01.03 - Data-Ink Ratio: This lecture, we are becoming
more and more practical. We will talk about the data
ing ratio after EdvarTift. It is a foundational
concept introduced by Edward Tift in his 1983 book, The Visual Display of
Quantiative Information. The equation here is that
data ink ratio equals data ink divided by total
ink used in the graphic. Currently, nowadays, we don't
use ink as much anymore, so let's translate that
into pixels because most data visualization nowadays
is created on a screen. As always, I will explain
this on a example. Data ink is the actual ink, the actual pixels used
to represent the data. The blue background,
the thick guidelines are also part of the chart. The data ink ratio will measure
how much of your chart is actually doing work versus
how much is just decoration. Like the blue background here, you would still understand this chart without
this background, so this is just decoration. Your goal is to get that
ratio as high as possible. If I would take a
look on this chart, this green part represents data, those labels represent
data, and, of course, the titles of the actual bars represent data, and
maybe the title. If a graph has too much noise
and distracting elements, it is considered to
have low data ink. Okay, let's go
over this example. In the below example, the
background, the grid lines, the shadows and other
unnecessary aesthetics distract the data from
being represented. On the contrary here, take a look at the second
more simplified chart. Removing distractions
makes the visualization far easier to understand, and the person viewing
this is able to focus more on the data itself. I think this is pretty clear. However, we need to make
sure that the diagram is not simplified so much that the ability to understand
the data is reduced. Like for example
here, I'll remove the data next to the
YouTube, the red bar. It becomes difficult
to understand. Yes, you may see that
the bar is twice as long as messenger as the blue one
or maybe more than twice. So to summarize
this, try to reduce clutter without compromising
the chart's message. I think this is
pretty clear and TITA actually laid out five
laws of data ink. Above all else, show the data, maximize the data ink ratio, erase non data ink, of course, within reason, not like I did before. I erased a little bit too much, erase redundant data ink
and revise and edit. In the next lecture, we will go over an example where we do all those tasks and overall take a look at a chart and
try to adjust it.
5. 01.04 - Data-Ink Examples: In this lecture,
I want to go over a practical example
regarding data ratio. We have a chart here. There are many changes that
we need to apply to it. Let's take a look
at it. Above all else, we should show the data. But if we use a three D chart, the data is skewed, the bars appear a little
longer than they really are. So first of, I'll get
rid of the three D data. Now the grid lines, the grid lines
obscure my vision. They actually make those
bars difficult to read, so I'll remove this and the second guides
as well because we actually don't
need them as much. If you want them, you can
make them thinner lighter, but definitely not as thick. Now, because we have a shadow on the text on the left side, I want to remove this
shadow because it makes the entire chart
difficult to read. Now I want to remove the shadow around the bars themselves. I think the colors
are perfectly clear. Now some colors are
not perfectly visible. We will address that soon. Currently, we have
plenty of colors, but for example, the green
one, the yellow one. In my opinion, they are a bit hard to see on
this background. I will make the
background white because I will not tell
you that you never can use gradient backgrounds, but in the majority of cases, it isn't a great idea, especially if it obstructs the
vision of the actual data. Now, take a look at
the bottom axis. The bottom axis goes from
zero points to 100 points. Do we need 50, 60, 70, 80 when
there is no data? Let's reduce the
axis to stop at 50. Currently, it is
better adjusted. The bars are actually
better visible. They became longer, but
they start at zero, so that is no problem. Okay, what else can we do here? Actually, it's a bit difficult
now to compare the data. For example, can you tell
me how much is orange? Is it four? Is it tree? Is it tree and a half? And how much is the dark green one? Is it 18, 19? So let's put the data
directly over the wars. Now I can see precisely
that's beautiful. 4.2 for orange, 18.5
for the green one. Okay? Let's move forward. If you take a look
at point number, erase redundant data ink.
What is redundant here? On the left side, we have all the names
of the categories, and on the right side, we
have a legend with colors and with the categories written
again. Do we need this? I think it's
perfectly clear that you see yellow email newsletter, you don't have to
put it twister. So I'll just remove the legend. Now everything is cleaner, but we are at 0.5
now. Revise and edit. I like this design, but I dislike the
colors that we have. Let's make everything uniform, and let's focus on the
message we want to give. What is the message here? Depending on what you
want to showcase. In my case, I wanted to
showcase the two highest bars. Those are retargeting ads
and referral program. So let's change the color
of just those 2 bars. Now I don't have to look at
yellow at orange at blue, and I don't need as much
brainpower to see what is here. Would prefer personally, if the highest bars would
be on the top side. This is now revising
and editing the chart. Because those are the biggest, the highest and the
most prominent ones that I want to show you, I'll put them first,
so I'll order everything descending from them. Okay? Right now, I can
put the data inside. I think it looks just
a little bit cleaner. This is a personal
design choice, but I think this
will look better. Okay, I'm editing the chart, and now I can decide
whether I want to stay with the bars or with respect
to the data ink ratio. The bar is a little bit big. Why do I need that much ink if only the last
bit of information, if only the data
point is important? I can change the actual
entire bar into just a line. Here, I'm using a lollipop
chart to display the data. This is how we went from
a I don't want to say ugly but improper three D over designed chart
into a clean, simple, and understandable
chart like here. And this is exactly
what the data ratio is meant to represent and what
you now going forward, will take into
consideration when creating charts and data
visualization overall.
6. 01.05 - Encoding and Decoding: In this lecture,
we will talk about encoding and decoding, and I'm very excited to show
this to you because it's a very important part that is often missing when talking
about data visualization. And this is the reason
why you soon will be the better designer than
anyone else you know. Hopefully, Data visualization is a two way translation process. As a creator, you take
a number, for example, from a table like
a revenue figure, and you encode it into
a visual property, like the length of the
column in a chart. This is called encoding. Then your audience
looks at the bar at this chart and must decode it back into a
number or conclusion, and this is the most
difficult part. This part is called decoding. So you need to make sure that the chart you designed will be understandable for
people and they will draw exactly the conclusion
that you want them to. Nothing should get
lost in translation. This is your entire job here. Encoding is simply the act of turning a number into a shape. These are just examples here in a table depending on
what you want to use. You have the same
value here, $100, but on different charts, those $100 will be
represented differently. On a line chart,
it will be a line. On a bar chart, it will
be a column or a bar. In a bubble chart,
it will be a circle. On a pie chart, this will be half a circle or a part of
the circle or an angle. On a map, this might be a color, a darker color, a
brighter color. It all depends on what chart type you pick.
How do you select? What will be simpler?
What will be more difficult? What
will be appropriate? I will teach you
everything about that in this course in the upcoming
lectures, so don't worry. Now, decoding is where your
audience's brain tries to turn your shapes back into numbers and
just look at it. You've probably not thought
about charts like that. The problem here is
that the human eye is not a perfect sensor. We are naturally better at understanding some
shapes than others. For example, we have
an easier time to understand the length of the bar than the
area of a circle, and that is a fact. You need to memorize
this very clearly. Let's take a look
at the circles. Circle A is our baseline. Circle B is exactly 30% smaller. But looking at them, can
you really say that? Can you judge it? Just
by looking at them? It's very difficult. Now,
let's compare those bars here. Those are the very same numbers, but encoded as length, only length, not entire area. Bar B is, again, exactly 30% shorter than bar A. It's not that easy to see, but if you would put them on a common scale
next to each other, now it becomes much simpler. This is why graphs, charts and especially the bar
chart is so understandable. If you want your audience
to get the right answer, you have to choose
the right shape. Here we would only
have to decode length and we would be done. To recap this entire
lecture, you, as the designer, take the data and want to extract
a message out of it. This will be your encoding. You choose a way
to visualize it, for example, with a chart, and Walla, we have
data visualization, but this is only
half of the story. The tricky part is your audience needs to have an easy
time decoding it. So it's not enough
creating a chart. You also need to always be mind of the people who will see, read, and hopefully
understand it. In the next lecture, we will
actually go over what is simpler and what is more difficult to
comprehend for people.
7. 01.06 - Perceptual Tasks: In this lecture, we will be talking about perceptual tasks. I'll present to you the
accuracy hierarchy. You will learn which
shapes are easier and which are more
difficult to understand. If you want to be
a professional, you have to stop picking
charts because they look nice. You have to pick them based
on this accuracy hierarchy. This exact list here
above is based on a famous study by
Cleveland and McGill. From 1984, it shows what the human brain
decodes most accurately, and it is still perfectly fresh and up to
date until today. I'm teaching data visualization, and I'm showing you a big table with plenty of columns and rows, and it's difficult to
comprehend and understand. Let me help to decode
it by changing the first column into simple
numbers. That's much easier. Now, the perceptual tasks, instead of reading them,
I'll change them into icons. I'll, of course,
give you some labels because probably you see
them for the first time. Don't worry. They are simple, and I'll show you every one of those tests on actual examples. Let's take some data. We will use the data for mostly all the examples
that we see here, 100, 95, and 50. Those are our three data points. Let us start with position
on a common scale. The most simple and easy to
understand perceptual task. And this is why you see those bar charts and
column charts so often. Because they are positioned
over a common scale, they all start at zero here. It's very simple to decode
the length of the bars. Not only that, we have a beautiful axis on the
bottom, so it's very simple. Okay, this is
example number one. Example number two,
the same data, but I encoded the data into rectangles with little
dots inside of it. The higher the dot, the
bigger the data point. Now, the problem here is we have the same data, 195, and 50. You kind of see the difference, but it's very
difficult to decode. It would be a bit
simpler if I put them on a common
scale. Almost there. We are almost at a common scale, but they are still spread apart and it's
difficult to judge. If I position them together, it's much simpler and easier
to understand and see. And this is why you
need to always try to reach for the highest
possible perceptual task. You want to use task number
one whenever possible, instead of test
number two, three, four, five or six
for that reason. Here we have position
on a common scale, and it is perfectly
understandable what is what? Now, the same data presented on a bar chart would be also
very understandable. Okay, let me move forward. What if we have a
stacked bar chart? So the first part starts again, aligned at the same zero point. But the light purple
bars start here. This one starts here and
this one starts here. Can you tell me which
one is the longest? You probably can't only
because they are all the same. All represent a value of 74, but they are unaligned. They don't have a common scale. So this is where
you have to be very careful when using
stacked bar charts. Okay, let's move forward to not make this lecture too long. The same goes for length. Basically, when you use bar charts or different
types of charts, you often decode length, but the same principle applies. If you have length
dispersed on a screen, it's very difficult to judge. It would be much easier if you put them on a common scale. And now, no matter the design, it will be much easier to
judge what is simpler, what is less simple. So the common scale is
really a lifesaver here. Now something
difficult angle and angle is very difficult
for our brain. This is why you have to avoid the pie chart as
much as possible. Can you judge What is this angle? What is
the second angle? You can see it's less
than 90, but is it 89? Is it 88? And now the third. Can you tell me how much it is? Is it 120, 130, 40? It's 125, and this is the exact reason why Pie
charts are so difficult. Now we have the same data
that we had before 100, 95, and 50 decoded
inside of a Pie chart. So it equals roughly
to 41, 39, and 20. While you can see
those two big points, if you wouldn't
have the data here, can you really tell
what is exactly what? Now the same decoded
on a bar chart. Look how simple it looks it is, and it also respects
the data in ratio because a pie chart itself
is a lot of pixels, a lot of information that
your brain tries to decode. A Brchart is simply
a superior choice, and in that case, you simply should use it. Okay, let us continue with the next perceptual tasks
in the next lecture.
8. 01.07 - Perceptual Tasks #2: In this lecture, we continue
our perceptual tasks. So we learn what is simpler and what is more
difficult to understand. Here, we have circles, A and B look identical. The 5% difference has completely
disappeared visually. Maybe you can see that B
is a tiny bit smaller. Area is very difficult
to decode for our brain. We have lost all
precision here and we cannot decode the values
without having them. If I put one over the other, yes, we can now see it,
but can you tell it? Is it 5%, 6%, 7% or 3% smaller? And if I put this here, do you know how
much smaller it is? It is actually 50% smaller because I'm
representing the same data. But instead of using charts, I'm using area circles. Even worse, we have now reached the territory
of chart junk. And yes, this is a term
specifically chart junk. When we move from two D shapes
to volume and curvature, we are asking the
human brain to do three D calculus
just to understand, for example, a sales figure
or any type of data. Take a look at the cubes because they are
tilted in space. Your eye can no longer find
the flat line to measure. Is up B 95% of cub A or is it 80% of cub
A? What do you think? You cannot really say because
the depth itself confuses your perception in professional
data visualization, three D is almost always a mistake because it prioritizes decoration
over documentation. Take a look. Now, it is
so difficult to judge. See even worse. Can you really tell that
this is half the size of it? You can approximately
eyeball it, but you cannot tell me
that you see it precisely. Now, the very same data
presented on a flat surface, it's still not perfect
because it's area. If I would have the possibility, I would try to rank up
in the visual hierarchy. Okay, let's move forward. This is not to tell you. The bar chart is the
best and you need to use it always for every
single case and scenario. But if the data allows it and the perceptual tests
do not restrict it, then you should the same
with this pie chart. Part B now looks much bigger than Part A because
it is closer to us. So be very careful when using three D. Now something
different, curvature. This gauge is a favorite
in executive dashboards, and I have to say, they look cool, but look
at the needle itself. The scale the needle is moving
on is bent into a curve. So your brain has to work twice as hard to figure
out the exact value. Of course, you have a car,
so we are used to it. It's very easy for us to decode. But because the needle is
just slightly above 40, can you tell me what the
exact value here is? You will probably assume
it is 41 or maybe 40.5. It is hard to tell on an arc. It is precisely why
you want to avoid those donut charts and
Pi charles altogether. Because here, if you
put it on a simple bar, you can precisely see it's 40.5. Okay. Let's go into the
more difficult category. Color shading. I'll put this
in one group as number six, our eyes cannot reliably and systematically always quantify
color in the same way. Not to mention some
color impairments, Duteranopia or
protonotopia, on Windows, you have a shortcut
Control, Windows key, and see where you can turn on and off different
vision impairments, and it's very important, especially as a data
visualization specialist. If you work with color,
you need to take into account people with
vision impairments might have difficulties
reading this. But going back into the topic, I will put two arrows here. Do you see this color
being the same or not? I will tell you
it's not the same. Those colors are the same.
Maps, colors, heat maps, and so on, of course, are usable in data
visualization, but be very mindful. It might sometimes
get difficult, especially if there is another color next to it
that is very similar. Here we have a table. The darker the color, the
greater the value. And here on the opposite, I think it's a
beautiful usage of color to enhance the
message of a simple table. The table, no matter
the topic of the table, is very easy to understand
because of color. And here, the color did enhance the message,
and I love this. Here we have population
change over time by region, and this is also
beautiful because we see that in
Hawaii, for example, the population has not
changed since 1920, and for South Atlantic, the population has increased and color depicts it beautifully. Here another perceptual task shading shading is
the most difficult. Can you tell me the
boxes in the middle? Which box is the lightest,
which the darkest? This is a very popular
example when talking about data visualization and Walla
they are all the same, and this is exactly
why you need to be very cautious about color, about shading and about using colorful backgrounds because
colorful backgrounds can distort what you see. If you need to use gray shading, for example, do it like that. A chart like that screams. Hey, look at bar number one, look at 4.3, but a bar like that doesn't
give you that information. You use colors just
to use colors, not to guide the viewer
or the code information. Pretty, yes, but it gives
me no tangible information. And without you
as the presenter, I would know what do you
want to say with this chart. To recap, as you move
down the hierarchy, you are trading accuracy
for aesthetics or overview. This is a cheat sheet here
to the accuracy hierarchy. We will use this and
what we've learned here across all the charts you are drawing from now
into the future. Please memorize this or give it a good look to start to
engrave this in your brain, and it will make
such a difference in your data
visualization story.
9. 01.08 - Remember this: Let's make a quick recap. What to remember if you want to create charts that
communicate properly. Take a look at the orientation orientation will naturally guide the viewer. That kind of orientation
will tell the viewer, Hey, watch this chart top
to bottom or bottom to top. A landscape orientation will tell me watch the
chart left to right. And a square orientation
gives freedom, but depending on the data, you will give hints
to the viewer how he should read this. Sizing your chart properly. As a starting point,
the ratio of one to 1.6 is a perfect
size to start, but it all depends
on your design. For a 60 by nine screen, as you can see, two
charts fit perfectly. If you want to fit three charts, you will most likely
have to go for squares. And if you want to have more
charts, then be careful. It might get a little crowded. When using colors and presentations, use
them intentionally. Don't make them random. Also remember that
colors like green naturally are interpreted
by viewers as positive, and colors like red, especially if they are drawn together, symbolize
negative values. You are free to use
your own color scheme as long as it isn't a
rainbow and it makes sense, and it doesn't give additional cognitive load to the viewer. Here, I would most likely
talk about one data point, so I would give it
a separate color. I would maybe enhance my
message with an annotation. You don't have to
always use annotations, but in that situation,
this would be okay. When it comes to access on the bottom and
on the left side, 1k2k3 K, but what? Potatoes, percentage,
revenue in millions, you always need to
properly label. If you've done or read any
business presentation, clearly know that
there is always proper labeling to what
you are displaying. If you have any
kind of estimates, communicate, for
example, like that, put an E on the bottom, use a different color, make it transparent, give
it a dash line. Now the user would clearly
see that quarter two, three, and four are
only estimates. There is a general rule to use transparency and some
dishes because this would also be perfectly
understandable when printed out and when
presented in gray scale. Thank you very much.
Let us continue now.
10. 02.01 - Proportion: Here, I would like to talk
about proportion and scale. Let's go over
different examples. The default way to display
charts is landscape because it's natural for our eyes
to flow from left to right. This is how you most
often see charts. You may as well
see them vertical, but it looks acceptable here, but it's very cramped together. I would most likely
go for the same data, but in a bar chart here. This is vertical,
but we of course, can also have charts that are
displayed in square format. Let us display this
on some examples. However, orientation suggests
the natural reading flow. Here we have reading from top to bottom or
from bottom to top. Here we have from left to right, and here we could read them both ways. So
you need to decide. Always be mindful when creating those charts and when
orientating them, how people will read them? How big do I make my chart? How long, how tall? You could start out with a
golden ratio of one to 1.6. Of course, this is
not always the case, but it is simply a ratio that works well for any
designs you create. Let's see that now
in some examples. Here's a presentation from an investor relations
presentation, and the most important thing I wanted to highlight here is that this slide uses
two different charts, one on the left and
one on the right. On the top side, we have
some key takeaways, and then we have the
supporting data for it and supporting data,
which are those charts. If you take a look
at those charts, if I put the golden
ratio above it, you can see we approximately
are right here. This is not rocket science, but the designer deliberately wanted to fit two charts here, and he tried to make them
visually also pleasing. However, don't try to force it. Here's the same slide from
the same presentation. But here we have two charts
and one informational box. If we put the golden ratio
on top of this chart, you can clearly see it
doesn't fit, of course, because this is a
square, this is a square, and this is a square. That's completely fine because the designer wanted to show three different
insights here. To recap this entire lecture,
the most important thing, you may use the golden
rule as a starting point, but don't obsess over it. The orientation of your
chart will influence the direction in which your data is being read and compared.
11. 02.02 - Color selection: Let's talk about color
and data visualization. Color is a powerful tool, but it is also the easiest
way to confuse your audience, and from a designer's
perspective, is the easiest to change
within a chart or anything. To choose the right palette
you first have to ask, what is the relationship
between my data points. We have three most common
categories when using color. We have categorical,
meaning different colors. We have sequential. Those are used when data goes
from low to high. And third, diverging. Those are used
when your data has a meaningful middle
or zero point, use two different colors that
meet in a neutral center. Let's go over some examples. Categorical, there's not much
to talk about other than different categories can have different colors from
your color palette. Each distinct category
has a different color. There is no sequence or
no meaning behind it. You need to be
deliberate with that, and the viewer needs to know
and needs to understand that there is no correlation between the colors here, only the data. As an example, I have
a presentation from Visa from their investor
relation presentation, and here they simply
use their brand colors, which is blue for their main logo and yellow,
the supporting color. Here they display two
different quarters, and they use two
different colors, simply categorical, because this is the colors that they use in
their presentations. We move to sequential palettes. This is all about
intensity because these tears have a natural order from a new sign up
to an elite member, the color follows the same path. Darker colors feel heavier to the human eye by making
elite members the darkest. The audience knows that this group has the
most weight and value. Okay, here a different example, and it goes from
this bright color bright green into a
darker blue color. 0-8 ", it goes from
green to this dark blue. Okay, now diverge. Those are used when you have a critical middle
or zero point here, zero or the neutral color, we have a very light color, everything positive,
we have green, everything negative
or a loss is red, and those are two extremes here. The closer the
city is to a zero, the more the color fades
to a neutral color. Deep red, of course, is
here, the most scary. The most classic example of them all will be, of
course, temperatures. We have a clear zero
point at 50 Fahrenheit. Everything below is marked
in blue and a darker blue. Everything above will be
marked in deeper, deeper red. Just note that this color
is a continuous gradient. This is all about colors.
12. 02.03 - Accessibility: Here, I would like to talk about color misuse, accessibility, and not confusing people with colors that we use in
our data visualization. This is a simple example. Just because you have
1 million colors in your software doesn't
mean you should use it. Of course, if you have a
beautiful color palette, that's in sync with
your brand and with your presentation in
general, go for it. Like I did here for the
entire presentation, I'm using a purple color and a purple color scheme when I'm
teaching you this lecture. Okay, but don't just
use color for fun, different colors here because the viewer's eye
will try to decode, Okay, we have
different quarters, we have different revenue. Why is something red?
Why is something blue? Why is something orange?
Does this have a meaning? I'm wasting unnecessary
time to decode the colors. If I would like to
talk about quarter, why don't I give it
a separate color? If I want to talk
about all the data, why don't I make it a consistent
color across the board? However, be very mindful to be also color friendly
with your designs. Let's say, here we have purple colors that are very
similar to each other. I can also open
my color filters. I can select deuteranopia, protanopia, or tritanopia,
and also gray scale. Why not test if they work
well on a gray scale? Okay. To make it simpler, I would maybe spread the
colors a little apart. Another example. Here we have a bubble chart with marketing
reach against investments, and the bigger the
bubble, the greater the investment must be to
achieve those results. Now, why is Tokyo green
and why is London pink? Is there any
reasoning behind it? Or did I just use
colors to use colors? Yes, I just use colors for fun, but I gave you another task to your
brain to decode colors. And if you remember
perceptual tasks, color is on the very
low side on this scale, so the easier your
chart is to read, the higher we can go on the perceptual task
scale, the better. I would choose a
consistent color. I would maybe gray out
the background lines to make everything stand out
to me a little bit more. This is what I want
you to remember when thinking about
color choices.
13. 02.04 - Annotations: Let us talk about
additional annotations on our chart designs. Let us show a chart that shows impact of social
media campaigns. Okay? We have the charts. This chart shows what happened. However, an annotation would
explain why it happened. So what happened in Week number three
and week number six? Are those spikes just
random data points? Your audience
either guesses what happened or will ignore it. Unless you are speaking to them, you can help yourself
with annotations. Here we have a viral
TikTok mention, and here we had an
influencer partnership. Memorize this, notations can be your additional voice on a slide when you
aren't speaking. Here a way to
communicate forecasts, we have a dash line
that clearly shows that this is just a trajectory based on the previous estimates. If you are using annotations,
you can, for example, go for a simple
text speech bubble and plot it above
the chart itself. This gives beautiful insight. Here, a very, very exaggerated example of
what I've just said. Let's say that I want to just tell you one story
from this slide. This is an overuse of
those annotations, but this is still proper
usage of annotations. Of course, I would
prefer there being less, but this was the intention
of this chart to give a wide overview of everything
important that happened. From the beginning, let's
say, for example, here, we had in 2025 imposed
tariffs by Donald Trump, and this caused the stock
market to go rapidly down. If I wouldn't use annotations
for myself right now, use arrows, you
wouldn't really see what's happening here because there are so many information. Those tariffs were
paused for 90 days, and this created a sharp
rebound in the stock market. With my arrows here, with those yellow arrows,
I'm helping myself. Okay, here a better example. Here's a better
example, and I think this is beautiful chart design. No matter the data here, we have beautiful annotations showing only the most
important points. Then we have beautiful usage of arrows to point to
those annotations, meaning those are important
points on this entire chart. At the end, we have
an insight that we only communicate through
this entire chart, and this insight also
has an arrow pointing to date to a annotated data point
at the end of this chart. I could show you
different examples like here, it's
something simpler. We have simply year
over year growth notified as this dotted line and information in the middle. We also have the
same information between different
quarters of a year. I think this is also beautiful
usage of annotations. Here another presentation
from Verizon. Here we have growth of
2.1% between a year, and we don't have to
count this manually because someone gave us
those beautiful annotations. This is proper usage
of annotation. To enhance the message. Not only charts can
have annotations. Here another presentation
where on the left, we have a table,
and on the right, we have some insights, and
there's an arrow between them. This arrow instantly
screams to me. Arrows are the best ways
of giving annotations. This arrow instantly
screams to me, Hey, this left bottom side of the table, this is
the inside of it. And this is exactly how you
should treat it and how you should work with annotations within your visualizations.
14. 02.05 - Labels: Here, I would like to
talk about labels and xs, something
extremely important. Now, remember always to
label without clutter. We often over label our
charts because we are afraid the audience
will miss something. But this is why
you are here for. Take a look at the bottom, how much percentages are that. Let's reduce it. Okay? Now we have
less percentages. We could also put them directly on the chart itself to
clean it up a little. Okay, now the percentages
are directly on the chart. But as you can see,
on the right side, we have a legend called
requested feature. On the left side, we also
have requested feature, and then we actually
have the feature names. I think everyone
would understand that we are talking
about features. I would remove the
legend itself, and probably in this case, if it's not that important, if this is obvious that
those are features, I would also remove the requested features
on the left side. This chart would be much
clearer to understand. Now, let us go
over another topic when to show the exact values. Do you have to label
every single data point? Usually, the answer is no, depending on what
you want to show. If your goal is to show a trend, let the line do the talking. For example, you could label only the most important points, like the start and
end that we did here. By only labeling
January and June, you would say to your audience, Hey, look how far we've come. If you label every
month in between, you're giving them a bit
more mental math to do. I also think this
chart is too wide, so let's narrow
it down a little. Of course, putting
the data points here would be acceptable, especially if they
are important. Be mindful of your labels. Here, I selected users, and I specifically said in thousands because on the
left side, we have an axis. On the bottom, you can
clearly see those are months. I don't need to write
months below it or date. I think this is pretty
understandable, but on the left side,
you didn't really know unless I
specifically say it. Now let's go back to the
example we had before. Previously, I was talking about the arrows and the
beautiful annotations. But did you see the left axis, the 100200300405 hundred K? The designer didn't repeat K, meaning thousands for
every single data point. He only did the K
once on the top side to communicate that those
are values in thousands. I think this is a
beautiful design choice. For the dates,
also, the timeline wasn't as important here. Wanted to show ten years, but the most important message
was on the chart itself, so he didn't bother to put many years here only
three different years. I really love this
because this was the key message that
everyone should be drawn to. Here are a different example
of beautiful labeling and actually focusing on
a part of the chart. Here, with a blue line
on the main chart, a part of the chart is
selected, a 5.7% increase. Then on the right side, there is a separate chart that showcases this change in more detail. I really like the
design choice here, and the blue color beautifully
communicates what is what?
15. 02.06 - Estimates: It's very important
to be capable of showing forecasts and
estimates on your charts. I'll show you a couple
of ways to do this, and we'll also go over examples. How do we show growth
that hasn't happened yet? You should never use a
solid line for a forecast. It implies the same level of certainty as your
historical data. Here in this example, I switched to a dotted line. Of course, on the bottom, if you look at the
axis, on the left side, we have actual data, and
at the very right side, 2055, we have estimates. This is also clear communication of which part are the estimate. You could also use a dark
color and you could even give annotation for
estimated 10% growth. This is one of the
examples you could use both annotations
and lines for. Another way of showing
forecasts would be giving a transparent pattern to one of the columns in your barchart
or any given chart. There are different
ways to do this. For example, you could maybe
reduce the transparency, but what I prefer if you
reduce the transparency, you might confuse someone with having one data point
and another data point. So I highly recommend that you increase this with
dash lines over it. There is a general consensus in data visualization that
forecasts should use dash lines. Here a practical example. Here are some estimates to where the stock market might go in the future from
different companies. What I like here that we have several companies we have their
colors and branding used, and we have the most positive, the bull case and
the most negative, the bear case, signalized with a red color on the top
side and on the bottom. Perhaps the designer didn't want to give a green color for the highest rating because this would create
too many colors. We have a blue line, we have gray lines, we have a red line, so giving another color here
might be a bit too much. I think the design
choices are perfect here. If you notice the
lines are dashed and there's a dot
at the very end. Then we have the
companies displayed. Here's another way how estimates were shown with a
dash line that is empty inside because
this quarter or this part of this year
hasn't happened yet. I think this is another
beautiful way to showcase it. Here we have the entire bar
and some estimates on it. However, always, be careful when looking at that. Here we
have different data. This is biofuel
energy production, and there is a map
attached to it. We have dashed lines
across half of the globe. So is this data or
isn't this data? Take a look at the
legend underneath. If you click on some
of the countries, you can see on the
legend on the bottom, there is simply no data. This is signalized
here in the legend, but someone might
initially think, Hey, this is a forecast, but
why is there no color? There is simply no
data for this dataset, and this is clearly shown
in the labeling below.
16. 02.07 - Decluttering: Let's talk about
declattering your charts. Declattering is
simply the process of identifying and removing non informative visual
elements to reduce cognitive load and
highlight the actual data. Ed Varuft that we
are already talking about has described
this as chart junk, meaning things that
doesn't represent data. All the software
that we use have so many fancy things
we can add to charts. Why do we use them? Sometimes it's
simply not useful. On a chart like that, I would definitely
remove the background. I would definitely decrease the intensity of
those gridlines. Do you see the shadows
on the bottom? I would remove the shadows. Why do we need them?
I would remove the three D rotation altogether to clearly
show the data to you. Now, I would remove every non essential thing to show you only the
data that I want. How you could approach this on, for example, a line chart. We have here a complete
spaghetti chart. At first, I would
increase the size and decrease the colors of everything that I
don't want to show. Let's say that you want
to show only two lines. This would be my starting point. Now I would make those
lines thinner. Perfect. Now I can clearly see the
purple line and the blue line. Now, look at the bottom. I would remove the legend and put it on the right side
so I have more space. But if I look at the bottom, how horrible are those
years to look at? Okay, let's start
working on them. What I could do to declutter this entire chart would be maybe positioning
them like that, but you still have
to tilt your head, and there's so many
redundant information. The year is always
displayed four times. What if we didn't show the year and only
shown the quarters? Then I would have free space to beautifully at
the years below. Do you notice the difference? Now we have a clear chart divided into three
different years. Did you see this in
the first slide? I bet you didn't because it was so difficult to
see anything here. Are simple techniques. Now
I would lower the color. Maybe I would increase the color of the part that I
want to talk about. Maybe I would add some data. Now I have room to work with. Those are techniques
that you can use within your charts
to declutter them. Another technique is
using whitespace. Let's take a look at this chart, and this is a scatter plot, and here we barely see the dots. Let's first remove
the grid lines. Let's now reduce the names. We have European something
Latin American office. Let's go EU NA, LatAm and APAC. Now the dots, do we
need that many colors? Let's reduce the colors to gray, and the data point that I
want to show you to blue. If I want to talk about
Europe and some kind of data, I would just highlight this one. Now I would remove
the big black border and we are left with a clean, understandable and beautiful
chart that was decluttered.
17. 02.08 - Remember this: Let's make a quick recap. What to remember if you want to create charts that
communicate properly. Take a look at the orientation orientation will naturally guide the viewer. That kind of orientation
will tell the viewer, Hey, watch this chart top
to bottom or bottom to top. A landscape orientation will tell me watch the
chart left to right. And a square orientation
gives freedom, but depending on the data, you will give hints
to the viewer how he should read this. Sizing your chart properly. As a starting point,
the ratio of one to 1.6 is a perfect
size to start, but it all depends
on your design. For a 60 by nine screen, as you can see, two
charts fit perfectly. If you want to fit three charts, you will most likely
have to go for squares. And if you want to have more
charts, then be careful. It might get a little crowded. When using colors and presentations, use
them intentionally. Don't make them random. Also remember that
colors like green naturally are interpreted
by viewers as positive, and colors like red, especially if they are drawn together, symbolize
negative values. You are free to use
your own color scheme as long as it isn't a
rainbow and it makes sense, and it doesn't give additional cognitive load to the viewer. Here, I would most likely
talk about one data point, so I would give it
a separate color. I would maybe enhance my
message with an annotation. You don't have to
always use annotations, but in that situation,
this would be okay. When it comes to access on the bottom and
on the left side, 1k2k3 K, but what? Potatoes, percentage,
revenue in millions, you always need to
properly label. If you've done or read any
business presentation, clearly know that
there is always proper labeling to what
you are displaying. If you have any
kind of estimates, communicate, for
example, like that, put an E on the bottom, use a different color, make it transparent, give
it a dash line. Now the user would clearly
see that quarter two, three, and four are
only estimates. There is a general rule to use transparency and some
dishes because this would also be perfectly
understandable when printed out and when
presented in gray scale. Thank you very much.
Let us continue now.
18. 03.01 - Framing: Here, let us talk about data
storytelling and how to frame your insight and
why it is so important. Let me give you a very
brief definition. Data storytelling is the process
of translating data into a narrative that guides an audience to a specific
conclusion or action. What you want to do with
your data storytelling, you want it to have a narrative. You want it to
have a conclusion. You want it to have a
hero story that someone is following and with that
understanding your chart, not the other way around. Is a simple example. It's a histogram representing how long it usually takes
to deliver a package. Let's say your boss wants
to know how are we doing. So most packages,
as you can see, are delivered within
one and three days, but there are some packages that take five,
six or seven days. There are even packages that
take nine days or ten days. There are a couple of data
points in those buckets. If you would tell your boss, we should only focus on the
things that we do good, we shouldn't be looking
at the outliers. Well, you would be basically
lying with your data. Let's be honest here. If I
would just draw a conclusion, our average delivery
time is three days. Well, that wouldn't
be very fair. It would be better if
I would say one in ten orders is double
our promised time. Let's work on that. Let's see what happened with
those packages. Before the first framing, we would just say,
everything is okay. We are fast. But the second
message would tell you, boss, we are inconsistent.
Let's work on that. So the way you frame
your titles and you showcase your chart influences
how they are perceived. Again, another way
of framing this. Most orders arrive on time. What is the conclusion here? Well, it is true, but it doesn't really tell
the whole honest story. If I would tell our outliers are driving 80% of negative reviews, we would have something
actionable to work on. You need to be very nimble and aware of what
you want to frame, what you want to tell, and
what you want to showcase. Here's a different
chart, a waterfall chart telling where did the cash go,
spending against earnings. We had 500,000 in January
and 520,000 in December. We invested heavily in
new hires and R&D cost, and thanks to that, we could achieve 250,000 of new sales. But the story would be
lost if we would just show January and December and now how we should frame
some insights. For example, you shouldn't be doing cash flow variant
ir quarter fodqar. No insight, no story. Like, no conclusion
comes from such a title. If I would tell our $200,000 investment in R&D and New
IRS unlocked those sales, I would be given an
actionable title that you can follow and
understand the chart of. So the key takeaways
when framing your insight for
data storytelling is headline against data. Never name a chart by its data, name it by its conclusion. Every single business
presentation you'll see, you'll have action titles, titles that explain the chart, not titles that just inform
what is on the chart. The so what test.
If the viewer can find the point in 5 seconds,
the frame is too weak. Please consider that when
you create your charts. Context is king. Never
show a number alone, compare it to a goal
to a peer, to a past. This makes it much
easier to comprehend and identify the hero
or on the contrary, when you want to show negative data, identify the villain. Every chart has a main
character, a bar, a line, a bin, or annotation
that you inform with. Highlight that
specific conclusion. Thank you for listening.
Let's move forward.
19. 03.02 - Narrative: Here we will be talking about structuring a narrative
with visuals. Let us go over this. We should lead with the key
takeaway right away. This is proper communication and giving the information
right away and the insight so everyone can understand a chart. Let's
look at this chart. This is warehouse efficiency. Look at this title, the title alone. It is
way too passive. It tells me what the data is, but not why it matters. It's a folder name, not a story. Would you draw a
conclusion by just having this operating cost
per unit by location? Gives you some information.
But what is the story here? What I want to show
with this chart? Do I want to show just the data? Then I would be just as good showing you a table.
Let's go forward. Regional warehouse
performance summary. Still not good. What do
I mean by performance? Do I mean speed,
safety, cost, size? What do I mean here?
There's no actual insight in the title. Now,
let's go forward. If I would, Cost of
shipping a box of oranges, San Francisco costs
$82 per unit, while Dallas costs $38. What's going on
here? Why is sending a box in Phoenix
that much cheaper? This is the cost of processing. That means labor, electricity, and rent required to take one
box of oranges off a truck, scan it and put it on a shelf. If you want to show a solution, you first have to
show a problem, and I'm showing you the
problem right here. The villain here are the
expensive cities that are way, way more expensive than, for example, Phoenix,
Austin, or Dallas. When you look at this
chart, your brain shouldn't be looking
at 22 cities. It should be looking at the big gap in the
middle in price. And the most important
part here is the title that gives the
insight because if I would hide the chart from you, just by reading the title, you would kind of
understand where I'm going with the data
that I'll be showing below. I will be comparing costs and I will be thinking
together with you, how can we reduce one to
get it to the lower level? It would be even better if we, for example, at
annotations here. Why is the gap so big? I, as the presenter, need to know what I want to
say and make simple graphics. Big exclamation mark,
big red arrows. Now you can clearly see that I want to compare those values. Let's move it one level higher. I'd be guiding the
viewer's eye because, okay, the title is a little
bit more interesting. 13 regional hubs have officially exceeded the $50 MX cost target. We are a company. I have
a MAX target of $50. I will mark that right
here on the chart, and now you can
clearly see plenty of cities go over our price target. Now we could be
starting to discuss it. I'm calling this
the cost target, and now someone even
not knowing what this is about clearly sees, Hey, something is to the right of this line and something is
to the left of this line. Okay. We should be using maybe color to enhance the
message, or, for example, framing this depending on
how we want to explain it, but always remember
you have full control over simple annotations that you give while presenting live. Of course, we should
be doing a bit less of framing if this
would be printed out, but you get the idea. Okay, let's try to, for example, divide this. You could use two different
colors like this color would show where we have to fix the cost and the gray
are the okay ones. We could add a line. We
could add a green zone. The green zone is our
target, what we want to do. The red zone would be our anti target where
we want to change. Of course, this would be
going a little bit much. I'm just showing you different possibilities what you can do on a chart to enhance the
message you are trying to do. The title is extremely important to give the
insight right away. Remember, if your article 0R publication has
20 different charts, no one will look
precisely at them. You want people to
understand your point by just reading the title and having a quick
glimpse on it. You can also use additional
charts to build logic, like we have in
this example here. Here I have a longer
chart, and at the end, there is a specific part which the designer
wanted to highlight, and he made a second
chart showcasing the minor differences and the details from
that little part. I think this is one of the
most beautiful designs I've ever saw about
data visualization. This is why I love to share this example and I have
it stored on my PC. Thank you very much
for listening here. Let us continue.
20. 03.03 - Dashboards: Here, let us talk about
dashboards against single charts or multiple
charts against single charts. Here, we have one big chart. If you need to prove a
specific point and you brute force the data on the viewer, this
would be perfect. One big chart showing it. 24 weeks of data. It proves there is
no lack involved. There is a trend, a pattern. I think we should even
use a line chart here. It would be more
appropriate because we are using a lot of ink
to show all the bars, while the data that matters
are only the percentage Okay, let's excuse this.
You get the idea. Now a different type of
constellation of charts. Here we have some
kind of dashboard. Let's say this is a daily
performance dashboard. We have calorie division,
we have progress, our steps, our calories, and we have expenses
that we take. Three different categories, seemingly connected
to each other, but depending on how we place
them, how big we make them, we switch the focus of the user, and we are giving a bird's
eye overview over the data, not focusing on one
particular thing. The advantage here that multiple charts
next to each other, try to showcase some kind of
correlation between them. For example, you can see
how takeout expenses or restaurant expenses
increase your carb intake and make you go less steps. But this is a thought process someone needs to very
carefully go over. The problem with multiple charts next to each other or with dashboards is that
they dilute the focus. There is a lot of noise here. You can use this kind of setup
for general information, general awareness,
but you want to use single charts for specific
insights, specific actions. Here, you possibly
might be hiding a great step count with
a bad bank balance. So you are spending too much, but it's okay because you
are making a lot of steps. This may be an exaggeration, but that's the idea
when our brain sees multiple charts on one
screen or on one inside. Of course, there are situations
where this makes sense. If you would have
several countries and several charts
displaying some kind of data point in
those countries, it would be easier to compare. But here we don't
have a common scale. Here we have the Pie chart. We have a radio chart
that is only decoration. It would be much simpler to
see this in a bar chart. But okay, let's say
a circle looks nice, so someone used it,
and on the right, we have another type of chart. There is a lot of
cognitive load for us. So be careful. When you are
storytelling with data, you want to know what type
of insight you would like to form showcasing several
charts next to each other.
21. 04.01 - Distortion: Misleading data and
visual distortion, a very important topic here
will be graphical distortion. Take a look at this chart. If you set your Y axis to
start at 90 instead of zero, the difference
between, for example, this small bar and a
bigger bar like here appears as if would be three times as big or
even four times as big. While for a bar chart, the baseline has to be zero. Cropping the axis is the most common way to
lie to the audience. If we start this axis at zero, look what happens to the data. Now it is a real
representation of the data, and now it's a real comparison. We've seen this countless time. This is a very popular example, but I'm sure that you've
seen in TV in politics, someone trying to skew
the data to his favor. Here, we have clearly
only 4% difference, but look how much
bigger this bar is, and it appears as it would
be much, much bigger. Here a different example
during the pandemic. This example is even worse because the
axes stay the same, but the dates below here, we have a difference
of two weeks, and here we have a difference
of just a couple of days. If you show this
on a real chart, this is the part that is
shown on the left side. Yes, it's pretty, but it skews the data to the narrative
it wanted to show. Not only that, moreover, the cases of coronavirus
per week in England were, of course, dependent
on the amount of tests that were made. So this is a very tricky data to showcase and to draw
plausible conclusions from. Another way of distortions
are three D charts. They introduce perspective bias. When we view objects in three D, the objects that are further
away appear smaller, like you can see
in this picture. But our brain tries to
calculate the difference between what he sees in the front and what he
sees in the back, and this creates not
only a lot of problem, but a lot of distortion. The prime example of that
being the pie chart. Look at the pie chart if
it is tilted in three D, this pie looks much, much bigger than that part of
the pie, while, in reality, the sizes are the same, especially technically
because both are 20. I think this requires
no further information. Here, baseline distortion. Here we have a
line chart, and it is the perfect way to
showcase this data, depending on what
we want to show. But what if you
use an area chart? Now you are hiding certain data. Now, can you tell me
precisely the purple data? The red covers it,
the blue covers it. So the purple data
is now imprecise. What we should do if we really
have to use an area chart, which isn't an advantage here, we should definitely
use transparency or maybe move the purple to the front in that
particular situation. Then if we move
purple to the front, then red would be obscured
and blue would be obscured. So it's very tricky to
showcase area chart properly. We could display them as a
stacked chart, but then again, a stacked chart doesn't
have the common ground, doesn't start at zero because the red chart and the blue chart start
on top of each other. So this becomes very tricky. If I compare them
next to each other, please remember what
you want to say. If the individual trends
matter use a line chart. If only the total sum at
the end matters like here, then a stack chart
would be okay, and it could be an area
chart. No problem.
22. 04.02 - Lie Factor: Here, I would like to talk about the factor by Edvard Tufte. The lie factor is there to
find out if a chart is lying. You measure the
physical change on the paper screen or the
pixels in our case. If the data, for
example, grew by 20%, the data point that we have, but the bar or icon grew by 60%. While it should be
representing 20% growth, the lie factor will
be an equal of three. You have tripled the
truth, so to say. Let me show this on
different examples. Here is a core example. We have fuel capacity, and in liters, we have
four different phases, and let's say that we have
a fuel capacity of 100, 120, 150, and then 200. I grew two times. Let me remove the middle data. We have 0.1, 1,000.2 200. We used an icon that
also showcases volume. In reality, this icon is four times as big
than the first one, while the data only
grew two times. This is clearly an
inappropriate usage of icons because icons also grow not only in height
but also in width. So you are showing volume
where you shouldn't. It looks pretty, yes, but it's completely
inappropriate. So if you would calculate
the lie factor, the effect shown in the
graphic is four times as big. The effect shown in the
data was two times as big. So the lie factor would be
the size of the graphic, divided by the size of the data. 4/2 is two, and this
is way too much. When we are talking
about the life factor, specifically, we should be
within an acceptable range. The lie factor, the scale of
the graphic should always correspond to changes in
the data being represented. The graph that I've shown
in that example breaks that rule by using area to show
one dimensional data. Okay, Life factor equal to one will be representation of data true to real quantitative data. When the life factor
is smaller than one, the data is underrepresented, and when the life factor
is bigger than one, the data will be over
represented by real data. Apologies. This would
be a normal fair chart. Now, this chart, because we
made the axis much larger, has a lower life factor, much too low because it doesn't equally represent each point. And here we have an over
representation because we have truncated the axis itself, and now the bar appears
two times as big, while the data only shows
five points of difference. Okay, let us, for
example, count this one. First, you need to count
the relative change. So we need to make that
simple calculation, and the relative change
between the data points, only the data points is 0.05. Now let's take the
pixels that are used to represent
the data. The bars. 1 bar is a little higher, 1 bar is a little smaller, so we have that amount of pixels if we count the
relative change between them, we get a value nearly identical because there
might be one or two pixels on my monitor that wasn't able to show
this perfectly precise. If I calculate the lie factor, it's 0.98, so there's
absolutely no problem. I'm not lying with the data
when showcasing it like that. Here is more of a fun
information I found, and I think it's
valuable to showcase. Here we have a chart where the amplitude between the
degrees is very high. We have very high highs
and very low lows. But if we would just be showing
the average temperature, well, it is the
perfect spot to live. We have always 60
degrees around that. Well, this obviously
isn't the truth. And here another fun example, how to make a lot in a year, acquire at least one
wife and 13 children, calculate the per capita income, multiply it by the
amount of children, and Walla you are getting rich just by the
number of people. This is an exaggeration
and just a fun example, but it shows how it is possible to lie with
different data. So especially, keep that
in mind to remember about the lie factor and lying misleading people
with data in general.
23. 04.03 - Correlation: Let me show you an
extremely important topic, meaning correlation
and causation in data visualization and the
problems you might face, assuming one and the other. To explain a little,
correlation measures the degree to which two variables
move together. This indicates
statistical dependence without implying a cause
and effect relationship, so only a correlation. Causation on the other side, it describes a cause and
effect link where changes in one variable directly
produce changes in another through an
identifiable mechanism. Establishing causation
requires additional evidence beyond just observed
correlation. Let me give you an
example to explain all of this and why
it is so difficult. A fire broke out. You
need a certain amount of firemen to extinguish the
fire. This is obvious. The bigger the fire
becomes, most likely, the more firemen is
required to put it out. Is a correlation between the size of the fire and
the amount of firemen. But if you would translate exactly the same
thing to a cause, it's not exactly like that
because following that logic, you would tell that firemen
cause fire, even worse. The more firemen,
the bigger the fire. Well, of course,
this isn't the case, so this causation
would be simply wrong. Statistically speaking,
yes, there is a positive correlation
between those two variables, but they aren't the
cause of one another. So more firemen do
not create more fire. This is a spirious correlation. It occurs when two variables appear to be directly related, but a hidden third variable
actually influences both. Let's go to another example. This is a very popular example, meaning the ice cream sales
against shark attacks. The concept here, a
spurious correlation happens when those two variables
move together perfectly, but they really do not have
a direct relationship. Looking at the
chart, the lines for ice cream and shark attacks
are nearly identical. So you could say eating ice
cream causes shark attacks. Or sharks love the taste of
people who eat ice cream. But of course, this data is driven by a hidden third factor, summer and high temperature. It's hot, people buy ice cream. When it's hot, people go
swimming in the ocean, there is a chance
of a shark attack, a slim, but there is one. So those two variables
are correlated, but one does not
cause the other. There is no causation
between them. Another problem here is that
I show the data like that. If I would change one
axis or the other, I could very easily
showcase that the graphs do not overlap each other as beautifully as we
have seen previously. This is everything I wanted to explain for correlation
and causation. Be careful of spurious correlations that you
will find between two variables when there is a hidden third
variable behind them.
24. 04.04 - Data Bias: Let us talk about data bias
and data selection problems. Here is a simple example. I will conduct a research regarding remote work
preference in New York. I need to pick a sample set of people where I will
conduct my research on. So I'm selecting a sample from people who work in New York
who are actively working. Then I will get some
results and I'll extrapolate it onto
the entire population. How good my sample
selection will be determines how good my research
will actually turn out. Now, let's do the same study, but I will only
pick people who are already working remotely
and maybe already like it. If I would select those people
for my study, obviously, this would be selection
bias because probably those results would
be much different than just selecting a
random sample of people. This example will not
be informative at all. Okay. Let's conduct
another study. I want to ask, how do
people get to work? And I will ask people, do they prefer to drive with a car? Do they prefer to walk or do they prefer to
ride on a bicycle? On the surface, a
perfectly equal chance. What if I would ask the
question on a gas station where everyone is with
their car in 8:00 A.M. In the morning when they
are driving to their work? Obviously, most people
would select the car here. So this would be a clear
situation of selection bias. From my part. Selection bias occurs when the data used for a chart does not represent the real world population
it claims to describe. Obviously, you cannot ask all the people in the world
to conduct some studies. You have to extrapolate
it to the entire dataset, but your selection is
extremely important. There is one other thing
when selection attrition. Let's say I have a 30 day
fitness app challenge. I'm starting the challenge, and 1,000 people
start my challenge to lose 10 kilograms
in 30 days. A program. Okay, on day number one, everyone is motivated,
everyone starts. On day number 15, probably less people
are as engaged, and the people who
remain there are more motivated because they
start getting results. We have some people
that are being happy and definitely will
hold out to the end. On day 30, we have
barely anyone left. The ones that survived until the end are
relatively happy. So this would be my final group. If I would display the data only on the
people that are left here, our users you lose an
average of 12 kilograms. Well, from the 20 users who left at the end of the
1,000 people selected, well, yes, the average
might be this good. And this is attrition bias. Attrition bias
happens when people or data points drop out
of a dataset over time, and the ones who remain are systematically different
from the ones who left. So be very careful when
selecting your data points, and if there occurs any
attrition or selecting bias.
25. 04.05 - Normalization: Here I would like to talk
about incorrect normalization. This is a simple
topic, but yet so important to never try to
again lie with your data. On the left side, we
have healthcare spending in billions between two
different countries. What we don't see here is that country A is five times
bigger than country B. So even yes, the spending is a little bit higher,
but in reality, what we should be showing if we want to compare
those two countries, they have to have a
common denominator. If population isn't an option, then let's make it
spending per person. This would be a common
denominator between those two, and here the data looks
completely different. We have 5,000 for country A
and 10,000 for country B. I think this is pretty
obvious and understandable if you have a big square and
if you have a small square, 10% of the big
square isn't really comparable to 10% of
the small square. If you would compare
them directly, it's like comparing apples
to oranges because they are completely different and
on a different scale. Here another example of
a wrong denominator. I think it's rarely that
we do something like that, but here we have
sales per store. We have between company A and B. But let's take a look
at the data behind it. This is the data behind it, and company A divided the total
sales by total employees. While what they should be doing, they should be dividing
total sales by total stores, which is two, and
it would amount to 5 million, not 1 million. Company B, however, is counted
properly on this chart. We have total sales divided
by the number of stores. So sales per store should be changed maybe to
sales per employee. No matter what change occurs, both need to show the
same denominator, not two different values for
two different types of data. If I would like to summarize
this entire lecture, incorrect normalization
happens when data is scaled or adjusted using
the wrong reference, using totals instead
of per unit values, dividing by the
wrong population or, for example, mixing different time periods or units together. You cannot precisely
compare a year to a month. To recap this entire lecture, the key takeaways here will be never compare row totals
for different sized groups. Always check the sample
group and equals how many and always label
your axis clearly. Is it in percentage, in
dollars in units per person, per sale, per store. I think this is perfectly
understandable, and I'm sure you will
never make such mistakes.
26. 04.06 - Aspect Ratio: Let us talk about
another important topic, meaning Spec ratio
in line charts. The line chart is
very often used and prominent and there is something
to understand about it. This is very simple to see on such an example where
I take the same chart, and if the chart is
very wide and flat, it suggests that there is a
very slow trend happening, while if I take the
same chart and I simply narrow it down,
I squash it together. Now all of a sudden, the
trend becomes quicker. So a narrow and high design will exaggerate the trend.
What do we do here? How do we find the middle
ground and how do we decide what chart
slope is appropriate? So what is a good, correct or honest aspect ratio
for line charts? There is often a cited rule
called banking to 45 degrees. It says that the average
slope of the lines on a chart should be 45 degrees, but this is only
a starting point. This was in the research paper by Cleveland McGill and McGill, and it's only a starting point. But let me show you
this on an example. I would have a
chart, I would count the degrees or average
slope in my chart. Here, for example, I
have 73, 79 around that. If I make this chart wider, it's 58 and 63 now. If I make it even wider, okay, we are getting close
to the 45 degrees, and this would be approximately how this chart
should be displayed. But in my opinion, currently,
it's a bit too wide. It covers almost
the entire screen, but the slope is very easy to
see and very easy to read. We don't always have a
simple chart like that. What you also have to
keep in mind that even if you make your chart
longer or narrower, you also have to
consider the Y axis. Look at the left chart
and at the right chart. The left chart has a wider
Y axis, it's higher. The way you read the
data here would be more extreme on the left
side and less extreme, less volatile on the right side. So it's very important
to know what you want to show and
show it appropriately, not trying to
exaggerate some trends. Now, here's a very
interesting example that shows this
magnified by 1,000. A study from 2006
illustrated that different aspect ratios can reveal different
signals in time series. I'll show you now CO
two concentration in a measuring
station on Hawaii. Both of those charts display
the exact same data, but we want to find something
different in this trend. This is why the right
one is more narrow. If I take a look here, what
is the conclusion that I see? What I see that the
downward movement is quicker than the
upward movement. The upward movement
happens slowly. Here we have 45 degrees, so it's displayed nicely. Here we have around 60 degrees, and I can now draw a conclusion by looking
at this chart that I wasn't seeing on the
left chart because the left chart has so many
lines and they are so high, I can't really differentiate between which slope
has more degrees. While on the right side, it becomes a little
bit more apparent. The same chart can show
you different conclusions, different trends, and
different situations. Remember about that
remember about that, especially when working
with line charts. This shows that the aspect
ratio of a chart can influence what you
can see in the data. Let banking to 45 degrees
be only a starting point, but, of course, deviate
from the rule when needed. As you saw in the
previous example, different slopes can
tell different stories, especially when
using line charts.
27. 04.07 - Y Axis in Line Charts: Here, let us talk. When is it okay to cut the Y
axis from a chart, especially the line chart. So cutting the Y axis. Bar charts are about size,
about absolute magnitudes. Here, the bars start at zero, so the length correctly
represents how big each value is. That's okay. Now,
watch what would happen when I cut the Y axis. Let me cut it. And as
we've already talked, this is unacceptable
within bar charts. Cutting the axis
distorts length, so it breaks the meaning
of the chart altogether. The bars suddenly
look very different, even though the data
did not change. This is because in bar
charts, we decode length, cutting the axis distorts
the length completely, it breaks the meaning
of the chart itself. So we need to decode length. We need to start at zero. This is the rule of tamp.
However, on the right side, I would like to show
you the line chart. The line chart is about change, about a trend, so the
zero isn't as important. It's even sometimes
necessary to adjust the axis itself to actually see the data because here previously,
we didn't see anything. We saw a very narrow line, but if we change the
size of the axis, we finally can find
some information. Here we decode the position on a common scale, not the length. So the rule of temp will
be adjust the Y axis in line charts to show
the trend clearly. That trend needs to
be shown clearly. So if you need it,
just adjust it. Here is an exaggerated
example from the book How to Lie with Statistics
by Huff and Gels. And it's from year 1954. I've seen this entire book and it's phenomenal,
especially the drawings. I love the comparisons
and the drawings. You can see on the left,
we have the entire chart, and on the right side, we have only the little part with
the head of the chart. What happens if you
change the YX yes, you make the face longer, but sometimes you need to
do this to show the trend, but don't try to exaggerate. Just try to be informative. There is a rule of thumb
that Andrew Gelman said, If zero is in the
neighborhood, invite it in. Here, we start the axis on the left side from
five, ten, 15, 20. So why don't we
include the zero? Here, it's very close to our actual dataset
and our actual trend. So there's no problem.
Here a different example. Here is the number of medals. For different countries, let's say those are some
kind of Olympic games, and what would happen
if I were to change the Y axis to start
at zero with France? So France would have in
this showcase, zero medals, USA would have 150 medals more, and Spain would have
negative 150 medals. Well, Spain has here about
100 less medals than France. I wouldn't like to see
this chart because not only do I have to count
everything by hand, what does it show actually
to me? Doesn't show much. I much prefer just
seeing the raw numbers, the raw totals, especially
when using a bar chart. Of course, if someone
really needs and wants, you could show that, but I wouldn't really
endorse such a chart. To recap our information, so how do we choose? How do we choose whether we can delete or not delete the
Y axis in a line chart? In general, in a time series, use a baseline that
shows the data, not the zero point. So it's complicated. You can include the zero point, but in line chart and
trends over time, you don't have to include
zero, while in bar charts, our perception decodes length, so we need to use
the zero point. The choice is context dependent. Thank you so much for listening, and let us continue.
28. 04.08 - Error Bars: Let me talk about error
bars and charts and why they are very difficult to
read, understand, and use. You need to be very mindful
when you use error bars. So what are they?
They are basically a graphical way to show you how much your data might vary. Let's say that we
have some data about the public satisfaction with local public
transport in Denmark. And we survey people, and we find out that 65% of people are satisfied with
local public transport. If we conducted
this study again, it could be a little bit less. It could be a little
bit more, but we are showcasing
on the bottom that the error bars show a
95% confidence interval with a margin of error plus -3%. And this is crucially important. I've displayed on the bottom, how did we count
those error bars? They can be represented graphically a little
bit different. We can use the caps on
the left and right side. We can use it without caps or, for example, showcasing
a line like that. This would be the range in which the values can
find themselves. This would be the lower bound, and this would be
its upper bound. Let's see the anatomy
of an error bar. Usually, they look like that.
We have the data point. Of course, we have the
cap or we haven't, depending on how you
want to design them, have the lower bound
and the upper bound. This is the entire error bar. Error bars can be used in
different charts, for example, in scatter plots in dot
plots, in bar charts, in line charts, but
each single one poses their own problems. For example, in a bar chart, if it's on white here,
it's easy visible. But what if I made
it purple or black? You wouldn't see the
bottom side of it. It would look like
a dynamite without showing the bottom
part of the error bar. Another problem is
the difference in the mathematical methodology
already poses a problem in itself that we can show error bars that show
standard deviation that show standard error and the confidence intervals
that we used previously, or we can have a completely
custom range with minimum and maximum values that we applied to our dataset. Previously, on the
data that I displayed, we showed this on the
bottom that we are using 95% confidence intervals.
What does that mean? It means that if I would
make this poll 100 times, I would expect that
the results will be 95 times out of 100
accurate plus -3%. So I would be having
results ranging 60-68% each time
conducting this research. Let's give you another example. You know that we have to
start bar charts from zero, but for the sake of knowledge, because we can barely see
the differences here, let me make the
axis start with 36, only this one time, okay? But you'll in a second see why I shouldn't be using
this chart altogether. We are showcasing the
average body temperature. What I would like to show
you is only this yellow dot. This yellow dot represents
the data that I want to show the average body
temperature of each group. Let's delete the data, and let's now put the
bars above each other. Let me put them above each other to show you
this more clearly. If I would be
counting some kind of mean value between those values, those are the values
that I'm looking at, let's soon delete
the part that we don't need because if you
look at the yellow dots, the average between them would
be in the middle of them. This is the average
that I want to show. So this is the
actual data point, and the yellow dots are the individual data points
or maybe my error bars. So we have the average
in the middle, and all the yellow points are the data points that
we have available. We could represent
this with a dot plot. We could represent
this with a box plot where the middle
line is the average. And here we could also display
it with a violin plot. All of those plots would
display the same data. You can decide for yourself
which is graphically the most pleasing and the most
appropriate for this situation. Error bars, they
are named error, but they don't always
display error. They can also display the range of values where your data lives. I hope this is understandable. This is a difficult part
of data visualization. This is why you
rarely see it because it's not easy to use
error bars properly.
29. 04.09 - Remember This: What do you remember
about this section when we talk about visual
misrepresentation of data. Be very careful about your axis. Try to avoid truncation. Don't try to hide data
with the axes themselves. The same data might
look vastly different. Just starting the axis with
zero isn't always enough. Scale it appropriately. Try to avoid the lie factor. You can under represent
the real data or you can over represent the real
data. Be very careful about. Consider correlation and
the cause that it provides. Because something is correlated, it doesn't necessarily
mean that one causes the other like we
had in the fireman example. We cannot really say
that firemen cause fire and more firemen
cause bigger fires, but there is a reverse
correlation where bigger fires cause more
firemen to show up. It's really important that
you select a fair dataset. Don't try to skew the data. Don't try to use
selection bias and attrition bias to make your data look favorable
when you conduct a research. The aspect ratio you choose has a big impact on how a
chart is perceived. Know what you want to
say, know your charts, and know your data to
represent them correctly. Know what you want to say. Also, don't make your
axis skew the story line. I know that we want to tell
stories with our data. This is why there is something
like data storytelling, but don't try to
overuse or misuse it. This is an example which
I wouldn't endorse, as I said, showcasing data
like that, which is skew. It seems like France
has no medals, where in reality, it is
the reference point. It seems like USA did
better than everyone. While in reality, yes,
they did a little better, but this is the data that we
actually have plotted here. Okay, thank you so much. I think this is understandable. And as we practice
data visualization, we will get better with it. And this will be surely automatically engraved
in our brain, and we will remember
those important things.
30. 05.01 - Plot Types: Welcome in this section where we talk about types of charts. Here, I would like to give
you a brief overview so you know that we have
different categories for different types of charts, just so you start to
organize our knowledge. Later on, we will go
into specific charts. But here, I would like
to present to you, we have data overtime charts that will track trends
like the line chart. Then we will have
comparison charts for seeing how values stack
up against each other. Then we have relationship
charts to find connections. Then we have
distribution charts. You've probably seen charts like the density plot, a histogram. It shows you the spread of data. And finally, part to hold the famous or infamous
pie chart, and so on. Let me now go briefly
over every category, so just start to organize
the knowledge in your head. First, data over time. This category is used to
visualize the evolution of a trend across a
chronological period. This would be most
prominently the line graph, the area chart, or the
candlestick chart, like we have in stocks. The most important
information here, it moves up, it moves down, or are we standing still. Comparison, this category
is designed to show the relative difference in size or magnitude between
distinct categories. And here, of course,
we have the bar chart, the lollipop chart, and a
clustered column chart as well. You could track how
many pizza slices you ate versus your wife
and versus your son. Most importantly, it shows you who is winning and by how much. Then relationship chart. They are used to identify
the correlation or dependency between two or
more different variables. Here we can use the scatter plot where we could
track, for example, months and ice cream
sales, and obviously, the warmer it is, the
more ice cream is sold. Or the bubble chart, let's say you have the price of coffee and we have the amount
of profit that we make. But the size of the
bubble would show us how much customers
go into our shop. So a third variable added here. Or a heat map showing
you hot spots. For example, we would have students that have
different subjects, and at different time frames, they seem to be fresh, and the later it gets the colder the spot become
because they are less focused. Those charts are
meant to show you how one thing affects the other. Then distribution shows
you the frequency and spread of values
within a single dataset. This would be a histogram, for example, let's
say, shoe size. The shoe size number nine is much more common
than a shoe size 13. We are grouped
that in categories and showcased on a histogram. Then we have a density
plot that smooths out the curve is very
similar to a histogram. It will smooth out the
data, and, for example, a box and whisker
plot that shows you the average values
and the outliers. It reveals the shape
and outliers of your data is the
main information here and part to
whole type of charts, the composition of a total divided into its
individual parts. What do I show here? Of
course, the Pie chart. Let's say it's your
phone storage divided into photos, videos, and so on. Then a treemap would be also a part to
whole relationship. It could represent the hierarchy through nested rectangles. And a stacked bar chart is
also a type of part to hold because each bar shows you a total but divided into
separate little groups. Of course, we have
more types of charts. We have maps and so on, but I don't want to show you
everything in the world. I want to show you
the main categories, and from those categories, we can move forward and group our information
accordingly.
31. 05.02 - Line Chart: Let me talk about
the first category, data over time and
what we have here. Of course, the line chart. This is our go to tool for showing the big
picture over time. A line graph will be most
frequently used to show trends and analyze how the data
has changed over time. Let's take a look
at the anatomy. Line graphs are created
by plotting values as points and then connecting
them with a line. Typically, most
often the X axis is a timescale like days, years, periods, or months, and the Y axis is simply a
quantitative value or percentage. If I would go over key points
that make up a line chart. Trends tracks values over
time to show evolution, slopes, they are a metaphor. A upward slope means growth
and a downward means decline. If we put it simply, we plot points on a grid and connect them to show
the path of the data. Let me go over a
couple of examples. On the first example, we can use a single line to track
gym memberships. It clearly shows the study climb from January
through December. A singular line is
never a problem. Let me go now to
the second example. For a more complicated view, we can compare four different
social media platforms on one chart. Here, I can see which one
is growing the fastest, where they are
crossing each other, but we need to be very careful
with the amount of lines. I think four is approximately the higher end
of the line chart usage. If you want to talk specifically
about one data point, one line, you could, for
example, highlight it like that. Let me now show you another
example where there are more lines and
take a look at that. This is what you can
call a spaghetti chart. We try to plot eight
different product categories on this one grid because the lines for winter
gear like coats are dropping while summer gear
like shoes are rising, everything crashes
in the middle, everything crosses each other, and this is horrible to see
to watch and to talk about. I cannot even talk about here while presenting this to you. What I would do, I
would definitely highlight the data points
that I want to talk about, and I would decrease the
visibility of the rest. You could, for example, also
use a different chart for that because a line chart
won't be perfect here, but this is a lecture
about a line chart. I want to just show
you the possibilities, and this is everything you
need to know about this. A simple but very useful chart.
32. 05.03 - Area Chart: In this lecture, I'll explain
the area chart to you. An area chart can show
you the big picture. Let's go over the anatomy. It's, of course, very
similar to a line chart. At first, we have data points that we have
connected with a line. Just like a line graph,
we have the X axis. Most often some time
intervals and for the Y axis, we have a given
value or percentage. The difference here is that the area beneath it is shaded. This creates opportunities
and problems. I'll explain in a second. Key points I would like
to go over volume. It represents the magnitude
of change over time. Area. The shaded area
emphasizes the total, not just the and because
it's graphically very pleasing that the bottom area is shaded and has a solid color, it's sometimes a bit easier
to comprehend, to understand. And continuity, it is best used for continuous
data over a period. Let me go now over
different examples. On the simple side, we can use a single area chart to track, for example, monthly
active users. And because everything
beneath is shaded, we kind of feel the
size and the momentum, but this could be just as well represented
with a line chart. So why should and why shouldn't
you use an area chart? The problem here is that
the area chart uses a lot of unnecessary ink
to display everything, which isn't actually data. The only data point we need to read this chart
correctly, the line. Those are the points where
the entire data sits. So be very careful when using area charts because
most often they are just graphical design tricks and no merit is behind them. Even more problems arise when I start to use stacked
area charts. In this example, we are tracking website traffic by device. And it looks good, and it's a proper usage
of an area chart. It will tell two
stories at once. The overall height shows us the total traffic is increasing, but the individual colors
show a massive shift where mobile traffic starts to cannibalize desktop traffic
over the course of the week. What's the problem? We are
no longer on a common scale. Look at mobile. Mobile starts
at zero. That's perfect. But if you look at
desktop, on Tuesday, it starts here, on
Thursday, it starts here, and on Saturday it starts
here and ends above it. How do you compare
those different dates? You cannot properly do this. So unless the data is
completely not important, you can go for a stacked
area chart like that, else, I wouldn't recommend it. Here, I just wanted to
see the total views, so it's completely fine. Now, let's go over a
different problem. Occlusion or overlap
of actual data. Well, this would be horrible
by the designer if I would use a shaded area that is shading
over another area. Here I have two products,
product A and product B, but you cannot see product A. You have no idea where it is. If I do the design correctly,
now you can see it. But still, I caused
some problems. Now, the last slide, what I want to say to
you is on the left side, you have a regular area
chart, and on the right side, you have a stacked area chart, and the designer or you, the data visualization
specialist, needs to inform the
audience what is what? Because on the left
side, we have product. Product A is clearly
bigger than product B. And if I communicate this
correctly, it's okay. But on the second
chart, Product A together with product B, make up this and this revenue, for example, but it's
very difficult to judge. By looking at the second chart, I feel like product A is four
times as big as product B. And maybe it is, but I have no way of judging
because they are not on a common scale
for each separate month, and it would be very
difficult to judge. So as you can see, area charts are very often used.
They look beautiful. I have to say, but
they use a lot of ink, and they may mislead you with the information they
are trying to showcase. Be always mindful about that.
33. 05.04 - Bar Chart: In this lecture, we are going
to talk about a bar chart. It is the ideal chart to compare different
categories to each other. A bar chart uses either
horizontal or vertical bars, also called the
column chart to show discrete numerical comparisons
across categories. If we go over its anatomy, a bar chart is drawn by placing a specific category on one
axis, for example, categories, years products, and a value scale on the other
like number of sales, a value, degrees, I think this
is pretty understandable. Key point I would
like to go over. It's categorical. Each bar represents
a separate category. Discrete numerical comparisons
across different groups. Ranking, the length or height
instantly answers how many? This is why it's so
easy to judge and compare the zero rule. It relies on a shared baseline
to compare magnitudes, and this is a key point,
a very important point. When you use bar charts, you should start
at the zero value. Here we have a very
simple example. I'm sure you've already
seen plenty of bar charts. On this example, we have website traffic sources
in January 2050, and we can clearly see that
the first bar is the biggest, meaning that this
is our powerhouse. Our website gets
visited directly. Okay, but could we, for the same chart,
display a line chart? No, because direct search, social, and referral are
different categories. They are not connected
to each other. They aren't a trend over time. They are separate beings, so the data between them
is not correlated at all. Okay, let's now go over a
bit more advanced example. Here we have plenty
of categories. We have ten different
categories, and we are using a
horizontal chart because we need more space to write out our titles, our
category names. Let's see how it would
look on a column chart. We can barely see the
names of the categories. So going back to it, the proper way would be to use a bar chart like that,
a horizontal one. Then I could clearly give my
statements or for example, add an annotation here or
even give data and put a green annotations that those are the top performers
against the other. Beautiful. However,
just as I said, be very mindful about
the axis on the bottom. Here we have the axis
starting with 96 to 106, and the five points almost
look like two times the size. But this is truncated and
creates a huge life factor. If we are honest with our
data and we start with zero, now we can clearly see the bars are very close to each
other in terms of size. The five points difference isn't as much as the
previous chart shown. So be very careful
when using bar charts, always start them at zero.
34. 05.05 - Scatterplot: Relationship. In this lecture, I would like to talk
about Scatterplot. A Scatterplot is perfect to show the relationship between
two different variables, meaning they showcase
the correlation between your data points to see if one influenced the other. For example, if you
spend on advertising, then the app downloads
should be going up, right? We can plot this on a
chart and see if there is a correlation between
more spending and more app downloads. Secondly, you can find
outliers that way. Those are points that
would be plotted far away from the overall trend. Okay, let's go over the anatomy before
we go into examples. We plot our data on a grid where variable A sits on the x axis. Let's say those are
spending on a good desk. And the second variable
will be on the second axis, and let's say this
is productivity. In theory, the more expensive
the desk we bought, we should be more productive. I know it is a stretch, but this is the
data that we got. Those are individual points we plotted on the Scatterplot, and we could add
a trend line that approximately fits in the
middle of the entire dataset. However, we should also
keep an eye on outliers. What happened that
some of the people, they got a cheap desk, but they still were
very productive. Always, watch out for these. Okay, I don't want to go
too long over the theory, but if we are talking
about Scatterplot, the correlation can be positive, can be negative or
can be neutral. Also, the correlation
can be linear. When one variable rises, the second variable
rises as well, can be exponential when rising the price of something
causes a lot of more, for example, app download and U shaped depending on
what's happening here. You could also talk
about the strength. I know that a lot of theory, but we have to go over it. The correlation can be strong when the points are
close together, it can be weak when the
points are further apart, or there can be no correlation. Now we know that there is no correlation between the
two different variables. To recap everything, you could
have a correlation that is positive exponential or a positive linear
strong correlation or a positive linear
weak correlation. It all depends on what
you want to show. There are also problems
with Scatterplot, and it depends on the data
that you are working. Let's see problem number one. There is no correlation. Like if I plot sunscreen this
is a very popular example, sunscreen unit SLT and total reported shark attacks or total reported
shark incident. It seems like when you
sell more sunscreen, then you get more shark attacks. But in reality, of course, most shark attacks
probably occur during the summer at the same
period as sunscreen is sold. This is why it looks like
it would be correlated when we plot both those
data on this chart, but we are, of course, data
visualization specialists, and we wouldn't come up
with something like that. The second problem
is overploting. You can see here
a couple of dots. But what if I told
you that in reality, there are 100 results
here, 100 dots. This is because right on the
bottom when we have 2 hours, 2 hours of app usage, we have plenty of data points. To overcome this, we can
jitter the data around. We can scatter it around, but we still want
to be precise here. So this is a problem
when using Scatterplot. This is everything I wanted you to know about Scatterplot.
35. 05.06 - Bubble Chart: In this lecture, we will
talk about the bubble chart. That is a relationship
type of chart. When you use a Scatterplot, it shows us a simple connection between two different numbers,
two different variables. A bubble chart, however, shows us the weight of the numbers because we can use the size of the bubble to add
a third variable. So the most important point why we use bubble charts is that the size itself of the bubble
will tell us a third story. And also, it gives you an
important overview at a glance. You are drawn to the most
important biggest bubble. Okay, let us explain
this on an example. Before we plot any data, let's look at the
anatomy as always. We will lose this
chart like that. We will have time on the
bottom and difficulty to fix an IT issue on the Y axis. Let's draw the chart and you can immediately see where
the biggest bubbles are. The size of the circle
represents a third variable, how many users are affected. This is the weight
of the problem. Because we now know how
many users are affected, we shouldn't just
choose what is quickest to fix and what is
the easiest to fix, even though the big bubble, being the system outage is
pretty difficult to fix. It is at a six on the scale. We probably should go over
that first before going to the database error because the system outage is
affecting 8,000 people. And this is exactly why a bubble plot might
become useful. Let's go to a different example. Here, the bubble
size will show you the estimated cost
of a given event. On the bottom, we have
the number of guests and on the left side on
the YXs we have prep days. So looking at that, even
though a wedding has approximately the same
amount of guests that a seminar and takes approximately
the same time to prepare, it is much more expensive. And the bubble chart is
capable of showing you that by showing you the
size of the bubble. But sadly, the bubble chart, of course, has its problems. If a bubble is big
enough to cover up some data, what
do we have to do? We have to use
transparency or make an outline because we
don't want to skew, we don't want to mislead
anyone by hiding certain data. I think this is obvious. This is everything you need to
know about a bubble chart. An interesting chart, but very
difficult to use properly.
36. 05.07 - Histogram: Welcome in this
lecture, I would like to talk about the histogram, a beautiful type of chart, but with very specific usage. While a bar chart, it looks like a bar chart, but a bar chart compares different categories like
apples, oranges, and other. But a histogram will show us the shape of where your data
lives for a single category. So you have one category. Divided into multiple bins. I'll explain this in a second
and show you examples. The primary reason
to use a histogram, of course, it groups
our data into binsk. For example, you could have
age groups one to five, six to ten and 11 to 15, and you could group them into three separate binds to give you a rough idea of the data. Of course, more binds
would be necessary here, but this is the idea behind it. Secondly, it
identifies the peak, and this is a very
important information depending on the drawn
chart that you identify, what is the common value
between those groups. When it comes to the anatomy, it's very simple to understand. Of course, on the bottom, we need some kind of
variable like wait time. For example, if you search
something on Google, you can see how many
customers are currently there and how it compares
to an average time. And this is exactly a histogram. Here, I'll plot frequency,
the number of customers. On the bottom, we have
wait time. Let's see. Looking at this chart,
you would see that the two highest bars would show the most common wait
times for our customers. Let's say this is
ten and 15 minutes. While it is less often that
people wait 30 or 40 minutes, it seems that it still occurs. Take note that all the
bars touch each other. From a design perspective, we could make very tiny
gaps between them, but the X axis is continuous. You don't want gaps because
probably there is some data. So you need to plot it correctly and accordingly
to the data you have. This would be the
peak, and let's go over an example
of a histogram. Here we have a boutique
loyalty program, and we have 76 data points or 76 persons that go
into our boutique. We used a histogram to group those age groups into three
different years each, seven to ten, ten to 13, 13 to 16, and so on. You can notice that the bars
climb steadily from age one. Obviously, a baby
cannot be client, but we reach a massive peak
between ages nine and 17, and then a drop off
from the age 20, approximately this is
a clear information that our business doesn't
just have young customers. We have a very specific hotspot here of pre teens
and young teenagers. If we will order inventory
for our boutique, we would now focus
80% of our budget on styles that will appear
to those age groups. However, a histogram, of
course, has its problems. It can simply hide the truth if your bins
are the wrong size. Let's say, for example,
here, the same boutique, but we have grouped them into age one and 20 and 20 and 40. Obviously, this
is now a big wall of basically no information because it gives
me no information. I can roughly say that we have more mature people buying
than teens, young people. Well, this isn't a
conclusion at all. I could also make too many bins. Like this also doesn't give me a clear picture
because age 19 and 20, we have only one
person, age 20 and 21, we have only one person, and age 22 and 23, you
have zero persons, so we shouldn't cater
to those age groups, and we should cater to
a different age group. Yes, we can see a clear
spike in the middle, but you shouldn't use
too much bins because it essentially becomes a
very large column graph, which we wanted to avoid by
grouping certain data into bins to show us a histogram
where our data lives.
37. 05.08 - Density Plot: In this beautiful lecture, I would like to talk
about the density plot. The density plot, a very
specific type of chart. It uses a smooth line to show the shape of where
your data lives. We use it for two
primary reasons. It smooths out the noise. Let me tell you what
do I mean by noise? Let's say that you
are tracking ages and you have customers
on those ages, you have ten
customers six years. Then you have zero customers
that have 27 years, then you have eight
customers that has 28 years, then you have no
customers that have 29 years because this
is data from today, let's say, it would
be very difficult to plot it here ten, here zero. It would be difficult if you
just use a column graph. The density plot will smooth out the average of those values and it's easier to see the data. The second reason why I want to use this chart is, of course, with those type of charts, it can identify the peak. Let's show you the anatomy. There's nothing difficult.
It looks like an area chart, but there is an
important distinction we'll talk about in a second. On the bottom, we have a variable or a
group or a category, for example, an age group, and here we have density,
likelihood of occurrence. And this is important.
We don't have volume. We don't have total values. We have likelihood
of occurrence. Okay, this would be the peak, and this would be the
tail where the data is tapering off or where
some outliers lie. If we go to a practical example, we have our Boutique
loyalty program member age. On the bottom, we
have the member age, then we have the frequency, how much of those
people subscribe to our Boutique maybe promotion. And we can clearly see that 25, 26 years is our core customers. So this is where we should
focus our marketing efforts. This would be the peak,
and those would be the outliers or the less
common people in our store. Maybe our inventory is targeted at this
younger age group. However, the problem
with density plot is this looks almost identical to this area chart on
the right side, right? But you read this
completely different and you use it for
completely different cases. On the left, it shows us
where our customers are. The peak at 25 tells us that
25 is our most common age. We simply have no inventory for people from a higher age group, and we are not surprised that they aren't
coming to our shop. However, if you take
this area chart, without seeing the density plot, without knowing, for example, about our boutique,
you could tell, Oh, no, within the higher age
groups, we have no customers. What is happening
or if this would be amount of profit made
from those age groups. What are we doing wrong?
Why aren't we making any profit from those age
groups? It's not like that. You shouldn't be
afraid that sales are crashing with higher age groups. Simply the sales aren't
there because there are no customers that
age in our boutique. This is an overall view
on the density plot. To recap, here we
look at the shape, and here we look at the
raw number at the volume.
38. 05.09 - Pie Chart: What is it about the pie chart? It's such a beautiful chart, but comes with its
own set of problems. We have to judge
the entire area, so it's always difficult
to read a pie chart. But let me show
you the advantages and disadvantages of it. A Pie chart is optimal
if you have percentages. For example, showing
three different numbers like here, I see the percentage. I can roughly judge, Okay,
which is the biggest, which is the smaller one,
and we have a total of 100. That's perfect. It
becomes a bit more difficult when we
have total numbers. If we wouldn't have the 50 here, you would only see 30, eight, 12, you wouldn't be exactly
sure what the total value is, you would have to
count it, and you always need to make sure that
if you show a pie chart, you want to show
the whole Pie chart because you want to see
differences between categories. All right. The pie
chart is okay to showcase less than
six categories because if you are
going over that 678, the Pie hart becomes
increasingly difficult to read. Just look at the pie
chart on the right side. It becomes difficult
to judge, for example, the blue on the left
bottom side and the purple on the right bottom side. They are kind of similar
or the blue above it, but you can't really tell the percentage unless you see it. A Pie chart is okay to
showcase, for example, for macardshare for different
segments or age groups, for device usage, compare
categories like that, but the optimal measurement
still will remain percentage. Percentage is ideal
because one equals 100%, and a circle displays
that nicely, but nicely is the
core word here. Nicely doesn't make it optimal. So to recap, the pie
chart is great to show that one segment is big or small in comparison to
the other segments. It's okay to show how one segment correlates
to the whole. Like here we have app downloads, and most of our applications
are downloaded on a phone and to show
simple percentages, like a big slice
to a small slice. Pie chart is very good for that. However, in most situations when showcasing
data visualization, a bar chart will simply be more understandable
and better. Let me show you some examples
where the pie chart fails. If the slices are similar, you can barely tell
the difference. If we have too many values. You can also not clearly judge or you have that
many similar slices. It gives you no real information and you have to
judge the area of. If you use a pie chart
for a separate category, you might as well not do it because it's very difficult
to read the areas. Why not use a column chart here? Why not use here a bar chart horizontally showcasing
all the data. It's much more understandable. It's higher on the
perceptual tasks, and it's easier to compare
it on a common scale. Here as well, we should use different bars for
different categories and we would be settled. This is everything about the Pipe chart,
you need to know.
39. 05.10 - Waterfall Chart: Here, I would like to explain
the waterfall chart to you. A waterfall chart is all about the journey between two points, the starting point
and the ending point. Instead of just seeing
that you started at one number and ended at another, it breaks down and explains
change step by step. It is mostly used
for financial data, so it highlights
gains and losses. Let's go over the anatomy. To read this chart, look for the starting and ending pillar. They are called totals. You have the starting
total and ending total. You can also have
a running total in between depending
on the data we use. Then we have data that shows
increase in those values, what happened there,
and that show decrease. I think this is very
understandable. Also, we used green color for increase and red color for decrease to make
it a bit simpler. And we also have
connectors because this is one continuous journey. Okay, let's show
it on an example. Here, we have a chart
analyzing our net income. We have our revenue at the beginning and net
income at the end. Here we have cost of goods sales and expenses that drew down our
gross revenue. Then we had some
interest on our money, but we also had to pay taxes. The end result is shown
in the last total, but because we have the
information in the middle, we learned how the journey went and what actually
happened during the year. Just because I said that
the waterfall chart is mostly used for financials, it doesn't mean that
it always has to be. You can, of course, plot
any data that you prefer. Here we have total Spark
Fitness memberships. And we see we had 32
memberships on Monday and 45 memberships on Sunday.
Happened in between? Well, we have some new sign ups because of an evening class. Some people canceled.
That happens, and we earned a few customers
because we had a discount. If I would just see
Monday and Sunday, I wouldn't really know
what happened in between. Thanks to this waterfall chart, I know where the inflows
and outflows went. This is what you need to know
about a waterfall chart.
40. 05.11 - Combo Chart: In this lecture, I would like to talk about a specific chart, namely a combo chart. A combo chart will combine
several charts into one. So the key point here is combining two
different chart types. Most often, we see a line chart and a bar chart
together where the bar chart shows absolute values while a line will show a rate
or trend regarding that. The purpose, of course,
is not decoration. It is to show related
metrics together. You want context, not
just a beautiful chart. Let's go over an example. Here's the same story
split across two charts. We have quarterly sales volume and quarterly sales growth rate. On the left, we
have total values in millions of euro
and on the right side, we have percentage growth
or decrease of growth. This works, but you need
to look at the left. You need to look at right.
Why not combine them? Well, okay, but as you can see, this is still not perfect. Not only can you
not see anything, and you are not sure
what represents what. Let's put the axis for the percentage growth
on the right side. It is better, but still
something is off. Okay, let's scratch that. Let's do it again. What we should do here,
we should plot the data. On the same chart. Now, let's
change it to a line chart. Okay, now we see two of the charts on top of each
other with different colors, but in order to understand
it at first glance, I would need to increase
the difference in color. I would make, for
example, the line chart blue and its right axis blue, and the left axis purple and
the chart purple as well. Now I can see two distinctive
things within this chart. But as always, you
have to be very careful when using two axis because let us focus
now on the data. Here we have negative values. But did you notice
that those are negative values where they are above zero on the left axis? Did you take a look at the right axis that we have negative two and
the negative four? Well, that's one of the
problems when using two xs. Those are separate beings, and you need to consider
them both separately. Additionally, you can
lie with the right axis. Let's change the axis
from 10% to 20%. Look what happens to the line. It gets lowered down a lot. If I increase this to 30%, I can put this even lower. Now let's change the bottom. Instead of negative five,
let's make negative 20. Now I've put this higher. The line chart got
squished between. Of course, I want this
chart to look good. I'm not lying with data, but I can skew the
data a little and its perspective to make the
change feel better or worse. If I would have a lot of decline and I would
adjust the lines, the decline doesn't
look as scary, but if you go back and the
decline is a little bigger, now the dip in quarter three
looks a bit more scary. Okay, to summarize
everything I've said, the ideal situation would be if the data is clearly
color related. Our data was. So
that's no problem. For example, ADSPend and
new customers acquired. The ideal situation would be also if they can
use the same aces. This would be perfect
as there would be absolutely no distortion between the data
we are looking at. But this is ideal word, like, for example, total revenue against what a company can keep. So this could use the same aces. We don't always
have that luxury. So when you will combine
different charts together, please be mindful of how you use the excess and
the chart in general.
41. 05.12 - Maps: In this lecture, I would
like to talk about maps, a specific type of chart. Of course, they are used in data visualization because they can highlight geographic
clusters and they reveal regional outliers, depending on what we want
to show, or in general, we can show different
charts on top of a map to enhance the message regarding to its variables in
different regions. I think you saw data like that. And what's interesting about
maps that you can plot different kinds of
charts inside of maps. Here, for example, a
geographical bubble map combines the features
of a bubble chart with geographic locations. In such a map, a bubble is
placed in a specific location and the size of the
bubble represents a particular value or variable. Let's go over a different map. This is a heat map on top
of a map, and intuitively, you see that the orange color
is the more intensive area, and the purple and lighter color are the less intensive area. Here we use a system of color
coding to represent value. It will be very useful to choose intuitive color palettes
that effectively convey the magnitude
of the values. Here, I'm not sure what orange represents and purple
represents, and initially, without me knowing what that is, it could possibly
pose a problem, but this is not up to
debate at this point. Now, the most prominent kind of map chart that you can
use is a region map where some regions with more
intensive data are darker and regions with less
intensive data are lighter. That way you can show
data by country, by state, by neighborhood. It gives you a clear snapshot of the distribution of an area. What are some problems
that I personally know from my data
visualization journey? You need to be very mindful of borders when you use a map. For example, here, I have Astra, and this was the most precise
map I was able to find. For the majority of people,
this won't be a problem. But, for example, if you
are from this country, then you are looking
closely at the borders, and you might point out that
something here is not right, something here should be made different if you want to
be politically correct. Smaller maps, it's
not such a problem. But if, for example, I have downloaded a world map
here to use in my design. Now, I wanted to show
some data from Germany. I'll extract Germany, and I
would now show some data. Then someone who knows the
map very well would tell, but hey, the map isn't
politically correct. It isn't very precise. I would say, apologies, it was just for illustrational
purposes. But if it's possible, and if you have a
specific country, you can find vector maps
of different countries, so you should try
to be as precise as possible here to avoid any misinformation regarding to day especially if you want
to dive deep into regions. This is everything about maps. Maps are beautiful
ways to show data, but be mindful where do
you get your maps from?
42. 05.13 - Remember This: What to remember after
this entire section. What we learned, we learned different categories for charts, data over time, comparison, relationship,
distribution, party whole. Of course, this is
not everything. We have maps, and
some charts are in multiple categories
at the same time. Let's just take a look
about comparison. It's not possible that I will explain all the
charts to you here. Here is a graph
showing what type of charts get into the
comparison category. Of course, you get
the general idea. You want to compare one category
to a different category, and those are the charts
you could possibly use. Each chart that we've learned about and
that we didn't learn about has its strong
sides and its weak sides. As a data visualization
specialist, you need to know what
to choose when and what would be inappropriate for the dataset that you are
currently working with. Sometimes it is okay to combine charts into
a combod chart, but be very mindful
about the access that you use and don't try
to lie with your data. Always be honest and try to minimize the lie factor
as much as possible.
43. Exercise 1 – Analyzing Data: I Welcome in the practical exercises
section of this course, where we will apply everything
we've learned so far to a real chart design to see if we actually
apply what we preach. This is the case study,
Switzerland tourism data 2050-2060. Debrief, welcome to
Neo Travel Horizons. We are the leading
strategic consultancy for the 2050 travel market. We have finalized our decade
forecast for Switzerland. Our team has combined
historical data that was available with our proprietary hyper loop impact estimates. Our mission, your mission
is to design a chart. Our board of directors needs
to see the relationship between the total volume of visitors and the
speed of growth. They won't look at a table. They need a high
integrity chart that clearly separates what we
know from what we predict. Just reading the brief. Between the total
volume of visitors, total volume, most
likely a bar chart, a column chart, and
the speed of growth, speed of growth, changes in percentage, changes over time. This could imply a line chart. Those are my plot suggestions
below this design, column and could take your
own spin on this butt. This would be the
most appropriate. Here is the table with the
data. I have two tables. One with commas
and one with dots because the US version
requires dots. If you use Excel or PowerPoint, this will be my tool of choice. I'll do this in PowerPoint, but you can of course
use any tool you want. I'll share the file
with all the data here. I'm in Europe, so I'll select the commas,
and we have a year. We have total visitors, we have percentage of change, and we have data status to
be actual and estimate. So we need to
figure out a way to show the estimated data as well. I've selected our
logo, our font, and a beautiful color scheme, you can use the color scheme
as well for your designs, or you can go with your own. That's not a problem. Here is a finished end result of the chart that I
want you to create. This is my spin on the design that we just had
and the table we just had, and I will go over
that with you. I will use PowerPoint,
the slide where it says, Work here, and here are the
tasks you need to perform. Analyze the table, decide
on an initial chart type. I think we've already done that because when I
look at the table, I'll, of course, use the years. And then I have total visitors. This is a total volume, so I'll use some kind of
bar chart or column chart, and for the
percentage of change, we will use a line chart. So we need to
combine two or make two separate charts
to showcase the data. We need to remember that some
of the data is estimated. Okay, let us start working. Select appropriate data. Just so I see what
I've already done, I'll select everything for now because I want all the data
that is possible here. I'll select Insert chart, and I need to decide upon
the chart initially. I'll select this column chart. And I'll plug in my data. As you can see, the text is a little bit,
but that's no problem. I'll decrease the amount of data to select just the
total visitors for now, and this would be my
result. I'll close this. I have now inserted
a column chart, so I've finished
task number three, and I'll start to redesign
and adjust it in a second. Let's adjust this in
the next lecture. This is our initial
design, and from here, we will try to improve
it step by step.
44. Exercise 1 – Building Chart: In this lecture, we'll adjust
the design so it looks more like this and is more
understandable for the viewer. Okay, let us follow
the brief and adjust the design to our
liking the bar width, the data label, the grid
lines, the font size. Let's go over it, depending of course on the tool you use. I'll take in PowerPoint,
right click, format data series, and
decrease the gap width. Of course, not so they
touch each other because this looks like a histogram
or an area graph, but I want them to be
considerably larger or thicker. Okay, I think this is
beautiful. Now the data label. I do like if we have
the data labels here, and this is data that
is actually important. So I want to enable
the data labels. Okay? Because I have the data
labels already here, I don't need to be redundant. Remember, remove unnecessary
ink from the slide. Okay, I removed the left
axis because of that. Now we have grid
lines and font size. I don't think we need the grid lines because grid lines are useful if we have the data labels or the
aces on the left side. Right now, we actually want
to see the data clearly, so I'll delete the grid lines. For now, I will delete the title because we
will add a title here, and I'll increase the font size, Alpres Control B to bolden
this up, and I'll increase it. I do like if we have
the same color. You can decide upon
that yourself. I'll change the color into this first color of
this presentation. Now the color is
consistent with the bars. Now on the bottom,
we have the years. The years are
beautifully displayed. I think we can have
the years here, and I need to decide
whether I want them bold as well and a little
bigger. I think I do. I think this is okay. Let's take a look at
what we had previously, okay, very similar design. Beautiful. I think I've
adjusted this design. Now I want to signal
estimates appropriately. In my case, when I'm
using PowerPoint, it doesn't have any type
of estimate possibilities. I can place a letter E here. Let's go to insert textbox. I'll insert the letter E, and I'll take the letter and use one of the
colors that I have. I have a red color
here, so I'll put the estimates next to
the four last years. I'll reduce the size so
everything looks consistent, and I'll press Control
D. I'll take the year. And I'll position
this accordingly. Okay. Control D again,
position it accordingly. Maybe a little to the left and Control D, a little
bit to the right. I think we can group
them together. Now I want to take
the last 4 bars. I need to select the bars first. Then I need to
click another time. I need to go to its
filling options, and I will start to work with a solid fill and
partial transparency. The solid fill should
be with the same color. Let's go for 60% to be
consistent solid fill, 60%, solid fill, 60%, and solid fill 60%. I would like to
put a dotted line here as well to
make it very clear, I'll select a gradient line. I'll deselect the gradient. Let's make it maybe this
dark blue that we have going over for the width, I'll increase the width
so I see it clearly, and I'll change the
dash type to dashes. Okay, maybe shorter dashes. Beautiful. Now I would like
the bottom to be transparent, just so the bottom ones
aren't visible here. I'll increase the transparency. Now I can move this
color a little further to lower
down the design. Okay, I think this is a beautiful showcase
of estimations, but we don't need the
five points width. I think the width to be
1.75 or two is beautiful. Now this is a beautiful design. I'll select the gradient line, and I will repeat the
steps on the other bars. It is a bit tedious, but it's the most professional
way I know to do this. This way, we clearly show that this part of the chart
are only estimates. You can of course, make
your own version of that, but I feel like maybe we could increase the transparency
to make it more apparent. But other than that, I feel everything is very
well designed. This part is complete. Let's go to the next lecture
where we will try to plot the change over time
on top of this chart.
45. Exercise 1 – Building Title: In this lecture, let us add
the line chart on top of it or use a Combo Chart
and work on the title. Let us go over task number six. I'll click on the
chart. I'll go to Chart Design, edit data, and we need to include
this part as well or just create a separate
chart with this data, depending on the
software you use. Okay, in PowerPoint, I
can just increase that, and now I have both on the same. I'll go to change chart type.
I'll use a Combo Chart. Lustered column, and here
I want a line chart or a line chart with markers
will be even prettier. I want this to be on the
secondary axis. I'll press Okay. I'll enable the axis back
again, so I see both. Okay. And now I can adjust
the line chart to my liking. Here on the right side,
I'll right click on this axis, I'll format it. And in the sizing, well, I need to make this
a little lower. And this make this a little lower and the
minimum bound maybe to 0.1. So the line chart is
actually visible here. Now, here I would just
require a couple of design changes to make
everything look appropriate, depending on the
software you use. Of course, you can
do this yourself. Somewhere, I lost
those data labels, so I'll just bring
them back again. I'll make this bigger and blue. And for this chart, I want to have the same color
that the chart has, so I'll select the
yellow or you can use another color depending
on what you use. Data labels, in my case, I want them to be
above the chart. And if something is not visible, I'll just take this
and put higher. And here, if I have zero,
I'll just delete it, and I cannot properly see
those on the right side. For that, you can either add a shadow behind
it, text options, shadow, open the shadow and give a shadow or take the bars. Let me go to the design options. Click on the bar and reduce the transparency maybe to 60%. So the contrast, 55%. So the contrast between the text and the bar is a little lower. I think this would
still look very good, and now it's easier to read. Okay, B PowerPoint has problems
when I remove the axis, I'll just take a
shape and I'll put a white shape above
it. That's no problem. It's just a design
gimmicky trick I need to do here in this
particular software, Alpras Control D, and I
want to hide this as well. For the estimates, we need to move the
estimates back again, and we are approximately
at where we want it to be. Of course, I need
to reposition them, but but that's just something that I have to do depending on the program I'm working in. Okay, for this line itself, I would prefer if the line itself would be much thicker and the markers themselves also
would be a lot thicker. So it stands out a
little bit more to me. Okay, now I think the
design is beautiful, very similar of what we did. What I want to say
with my title, I want to say that
this what you see here are the total number of visitors in Switzerland
and the change over time, 2050, 2060, are the years. And we could give an
annotation maybe an asteris and estimates by
Neo travel Horizon. Maybe this is my company name. I'll put it in red because we
have the logo here as well. I could put the information
somewhere else. If I prefer to, I could put this information
maybe on the bottom, estimate and I think this will look a little cleaner and not clutter
the actual title. This is a very simple title. If you would like to
use an Action Title, you would have to draw a conclusion depending
on what you want to say. If you want to focus
on the estimates, you would say something like
estimates predict to have an average of 6% growth
over the next four years, or growth has been
steadily increasing or holding the same level in recent years and the
foreseeable future. The title needs to reflect what the narrative
of the chart is. For the legend, I
will not go over it. I think this is perfectly understandable and depending
on the tool you use. In PowerPoint, I like to design custom legends because
if I take the legend, the original legend
that we have here, I cannot adjust a
lot of things here. So I prefer to design the legend myself,
depending on what I need. This is the finished first
exercise and what I want you to be capable of
achieving after this course.
46. Exercise 2 – Analyzing Data: In this exercise, we are cargo. We have a case study, the
warehouse optimization. Debris. Welcome to cargo. We manage supply chains for
the agricultural sector. Currently, our main grain
silo in Kentucky holds 206 tons of inventory. To meet the demand, we need to reach a final capacity of 209. Our mission, we will
have a dataset, and our mission is
to build a chart, the bridge that explains
how we reach this target. The regional manager
needs to see the gains from the
autumn harvest, the supplier buyback, and the
winter distribution losses. We need to see the movement
of physical goods, not just the final number. My plot suggestion here will be the waterfall because
the waterfall will beautifully show the
change over time that is happening within this table. Take a look at this table. Have on the left
side what happens in the inventory factory. The change, the running total, if we don't need the
individual change numbers and the category type,
what is happening. The baseline is 206, 209 is what we need to reach, and we see where the
increase and decrease is. This is beautiful to
create a Waterfall Chart. I've designed some elements, a color scheme, our
fonts, our logo, and the axis break icons
because in this lecture, you will learn the axis
break icons that we need to use to showcase this Waterfall
Chart professionally. Here is the end result of
what we want to achieve, and you can see the axis
breaks because in order to see the changes to be
magnified and bigger and be able to showcase
both the big bars, the 206 bars and the 5
bars next to each other, we need to somehow reduce the axis but
make it professionally, not just reduce
it to our liking, but use axis breaks that can be symbolized like that or,
for example, like that. Different presentations
have it differently. Most likely, it will look
something like that. From the data, in my case, I will only need this data
and the changes that occur. I have finished test number one, I have decided on an
initial chart type, and I have also selected the appropriate data and
will be inserting a chart. Within PowerPoint, I'll
select Insert Chart, and this time, I want to search
for the Waterfall Chart. The Waterfall Chart will allow
me to show this precisely. Okay, there is a lot going on. Let me first plug in the
data into the software. Let me make that a little
bigger, that a little bigger, so we see it, and I'll remove all data that
is unnecessary. Okay. But we have no
empty space here. Depending on your
software in PowerPoint, if you have this empty space, you need to select the data once again because currently
everything is selected. You can see even the data
that I no longer have. So I need to select the data again to only display
the things that I have. I'll press okay, and you can
see we have fixed the table. Okay, I've completed
test number three. Before we use an excess
break and do any changes, I need to take this last one, click on it, right
click and select set as total because
on a waterflow chart, we have the totals and
the changes over time. The first one should
be a total, as well. I think it is, Oh, it isn't. I'll set this as total. Now I'm sure that
this is a total. This is change, change, change, and this is another total. I'll additionally
enable data labels. And I'll go from there. In the next lecture, let me adjust everything
so it looks normal, and this is exactly why we
need to change the axis, make an axis break, so it looks professional.
47. Exercise 2 – Action Title: In this lecture,
we will continue the design of our
wattle for chart, and to make everything
more understandable, let's reduce the amount of
words that are used here. I'll pick on the chart.
I'll go to Chart Design, edit data, and from the data, what do I want to do? Do I need 2048
opening inventory? Let's make If or inventory. Okay? 249, If we have inflow outflow buyback. I think we will know
what's going on here. Okay, this is now much cleaner. For the chart itself,
we have the title separately above
it, so I'll delete. I'll get rid of the
title because we have it above and do we need
this legend? It depends. You can have it or I think this is perfectly
understandable like that. Now for the axis,
I need to adjust the axis to start
with maybe 190. I'll right click on the axis, format axis in my case, and I'll select the
minimum to 190. Now, everything is
more understandable, but we need to do the break. PowerPoint a is doesn't
have this feature. So what I can I can
select the shape. I can hide this with any kind of shape that has
the color of the background. In my case, the color of the background is
the first color, and I'll select no outline. What I want to do here, I want to insert some kind of zero so everyone sees
that this starts at zero. Okay? Beautiful. This zero
can be a little bigger. T zero can be here. Now I need icons
for the axis break. I've prepared icons, so I'll select the icons
here, Control C, and I'll bring them here
Control V. What you want to do, you want to put the axis break on everything that
is being broken. In my case, those 2 bars, the total bars, and also, of course, on the axis itself, because it needs to
be communicated that the axis is being broken
in that very place, and the axis resumes at 195. I'll maybe make the labeling
a little bit more prominent, even a bit bigger. And we have now 14 font, and I'll do the same for zero, 14 font, the blue, and I'll put it appropriately. Opinion, the axis is a
little bit invisible. So I would like to
increase the line here. I would like to increase the size of the line or
actually make it visible. So I like solid line, I'll go for the solid color
that we have the blue color. I'll increase the width of it. And now the line is
very apparent and I can put this break here and everything is
beautifully displayed. I think the grid lines
aren't necessary, so I'll remove the grid lines. What else do we have to do? We have completed
test number four. We have adjusted the
design a little bit. Of course, it's debatable
if this is enough. I could make those a
little bigger, as well. In my opinion, this will be beautifully displayed that way. And we need to build a title. What we want to say
is that we have a net three ton gain
over the previous year. So the 2049 closing
inventory shows a net three ton gain over the previous fiscal year
over the previous year. And we also should say
that we reached the goal. Maybe we reached the goal of this indicates that we achieved our result
that we wanted. Depending on what you want
to say with your chart, and always remember
when you write titles, make the titles
so they are being understood without
seeing the chart. This title itself, we
reached the goal of maybe 29 209 tons closing
inventory in 2049. Which is a net three toon gain over the
previous fiscal year. I know this is extensive
and long, but right now, if someone wouldn't see
the chart whatsoever, he would understand
what has happened. The other way around, if
someone sees this chart, he then will understand
this action title because here I have
the entire story and narrative that
this chart drives. I think we now completed
this exercise. Do your own adjustments, do your own design choices. Try to make a waterfall chart. If you want, you can,
of course, for example, use green for increase, use red for decrease. This would be just as good depending on the
designs we want to do. And here I have a clear, understandable
chart that reaches the goal of what we
wanted to say with it. Thank you so much for working with me through this exercise.