Data Visualization & Dashboard Design for Business Applications | Mark Chen | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Data Visualization & Dashboard Design for Business Applications

teacher avatar Mark Chen, Data Analytics Professional

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction

      10:00

    • 2.

      Section 1: Comparisons

      10:27

    • 3.

      Section 2: Distributions

      9:46

    • 4.

      Section 3: Relationships

      6:52

    • 5.

      Section 4: Compositions

      10:54

    • 6.

      Section 5: Context, Clutter, Color

      9:18

    • 7.

      Conclusion

      2:00

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

512

Students

3

Projects

About This Class

This course will cover the fundamental concepts for data visualization with specific focus on applications in the business world.  We will focus on how to select the best way to visualize a data set in order to effectively and objectively communicate the business insights to our audience.  By the end of the course you will understand how to build business focused dashboards that can be applied to any software.

Unlike most of the other courses on SkillShare, this one is not meant to be a "how-to" for using tools like Excel, Tableau, PowerBI, etc... This course is meant to cover what you want to achieve when using these tools, before you watch other videos on SkillShare that teach you how these tools function.

Meet Your Teacher

Teacher Profile Image

Mark Chen

Data Analytics Professional

Teacher

Mark currently leads the Supply Chain Data Analytics team at Mountain Equipment Coop, one of Canada's most iconic brands and largest retail suppliers of outdoor recreation gear and clothing. His team supports the analytics behind inventory management by designing interactive dashboards, automating repetitive day-to-day tasks, and encouraging a more precise and data driven framework for decision making.

Prior to joining MEC, Mark was a management consultant at the Boston Consulting Group, where he pursued his passion for structured problem solving (breaking a big problem into workable business questions), robust data analytics (turning the "data" into actionable "insights" to support executive decision making), and effective communications (delivering impactful recommendations cat... See full profile

Level: Intermediate

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Fundamentals of data visualization and dashboard design for applications in the business setting. One of his mark and I've worked in data analytics for the last 20 years. Initially as a biochemist, then as a management consultant, and currently as a manager of analytics at a retail company. Now these may seem like very different roles, but what they have in common is the need for storytelling, where my role is to find the most effective, objective, and impactful way to communicate business insights through data visualization. And what do I mean by that? Well, effective just means that you want to help your audience quickly understand the key insights from your analysis, regardless of whether or not you're there to explain it to them. Now if you're a strong presenter, you can always find ways to explain a complicated analysis, but walking your audience through it step-by-step. But at the end of the day, ideally your dashboard or your slide deck should always be designed with the expectation that it also needs to be a standalone product that can be understood by the user without your commentary. Objective means that you want to give your audience enough context to allow them to draw the right conclusions with minimal bias. Now sometimes you might purposely present your data in a way that emphasizes the point that you're trying to convey. But ideally, the data and a degree of confidence associated with that data speak for itself. Your objectivity will be appreciated, especially by savvy audiences, and it should help you build long-term trust with that audience. I initially created this course specifically for my own team to make sure that everybody has the same level of baseline knowledge necessary to be ineffective storyteller. And then I also added some additional topics that I found were very common blindspots for candidates that I've interviewed, but didn't hire. If you managed to make it through this course, my goal is to not only help you pass the interview, but also have the right tools to be an effective and credible data analyst or business analyst. Basically, if you have the job title of an analyst, this is what I would expect you to know a little bit about how this course is designed. It, this is an intermediate level course and it assumes some basic knowledge of common ways to apply data. We're going to jump right into the benefits and the common pitfalls when using, for example, bar plots and pie charts and how they compare with more sophisticated options. So the expectation is that you are familiar with some of these more basic plots, which I'm not going to introduce in detail. This course is also not software specific. Consider the software to be tools and there are already plenty of material out there. And how to use these tools effectively. What we will cover in this course instead are the fundamental concepts for what you want to achieve using these tools. The goal is for these concepts to be relevant no matter what software you or your company choose to use. And lastly, all the examples in this course will be heavily focused on business applications, sales, marketing, finance, supply chain, et cetera. Data visualization is a very broad and well covered topics. So naturally I don't want to create yet another course on this topic. And what do I want to create one that's so comprehensive that it is a 100 hours long. What I'm going to focus on, which I haven't come across in any other course are the use cases that are most relevant to the business world. For a more general discussion on data visualization, I would recommend electrodes by Tamara Munzner. That are available on YouTube. She did a very thorough deep dive on this topic, especially in terms of visualizations that are more relevant for academia and scientific research. If you're looking to get inspired by some great examples of data visualization. I am a huge fan of the 538 website by Nate Silver, who is a statistician famous for analyzing election predictions. And the visual capitalist, which is a website with a ton of beautiful infographics, each of which is like a work of art. And while I do take pride in my work and it is fun to make everyday task force, be it a little bit more flashy. In a fast-paced business world, my main priority and my expectation for my team, we'll always be to help our business leaders make better, more data-driven decisions. And that is going to be my focus for this course. And I'll leave it to you to check out these two examples on your own for how to beautify your work. What this course will cover. Sos day one on your new job as a data analyst, you've been tasked with creating a dashboard to support a particular business function. Where do you start? Well, the starting point for any kind of analytics is making sure that you understand the business questions that you were trying to address. And this is not a step that you can skip or glossed over. It's also not enough just to ask the user what they want because there is a bit of exploration and entrepreneurship involved with this step. Like the famous Henry Ford quote, If I had asked people what they wanted, they would have told me faster horses. So you want to go pretty deep on this one and make sure you understand at least two things. One, what sort of actions with the user take based on the insights that you were dashboard will provide. This will, for example, tell you that level of granularity that you need the data. You also want to make sure you understand how this information is currently gathered so that you can, for example, assess whether or not you want to use the same data source or if there's a better one that can be used. This will be the foundation for how you design this dashboard. How you assess whether or not a satisfies the requirements. And this will also help you later on with the change management. And change management is something that I had to think a lot about when I was an external consultant, but something that I find is often ignored when you were internal to the company. Just because you created a really awesome dashboard, it doesn't mean it's automatically going to get widespread adoption. You still need to do the necessary in marketing and demonstrate back to the user how this supports them in their decision-making and why is it better than what they currently use? Understanding number one allows you to do this. Once you're confident that you have a firm understanding of the business question, then you can proceed with analyzing the data and prototyping the dashboard, focusing first on the utility and then on the aesthetics. And the bulk of this course will be focused on number two, but I will also include some bonus material on number three. But beyond what I've just said, I am not going to talk more about number one, even though obviously you can't do number two or number three without successfully doing number one. And the only reason for this is because being good at number one is really more about having experienced in practice. It's not really something that I can discreetly summarize on a few slides. But if there is interest for number one, please let me know and I will make a separate course on this topic. And lastly, a few counties that I want to acknowledge when it comes to dashboard design. I will do my best to share with you what I consider to be the best practices. But there is some level of subjectivity. So my goal is always to give you my reasoning behind these rules so that you can make up your own decision about whether or not to follow or ignore these rules. And I do buy into this idea that most rules can be broken. But before you attend to break a rule, you have to be able to operate within the confines of that rule and you have to fully understand the reason for that rule. Two is that most of your users are probably not going to care about you. Her attention to detail. They probably never know that when I design a dashboard, I probably try out 20-30 different ways to plot a dataset and make a decision about which one of these ways does the best job at conveying the business insights. But as data analytics professionals, we should take pride in our work. And for sure as a manager, this is something that I pay a lot of attention to whenever I look at somebody else's work. And finally, Not everything that needs to be plotted out. Even though this is a data visualization course, in a lot of situations, a simple table might actually be the best in terms of clarity and simplicity. And these are situations that I will also point out later in the presentation. Okay, so you've completed number one and you feel confident that you understood the business questions. Now you're ready to start thinking about number two in terms of how you want to plot this data. First thing you wanna do is narrow down that type of analytics that you want to achieve. For example, is this a comparison of a particular metric across a certain dimension, like a comparing our revenue across different geographies, or comparing the actual inventory levels today versus what was budgeted in the beginning of the year. Showing the shape of a distribution for something like the size of each product category in terms of the number of products within it, or the age of your customer base. Is this showing the relationship or the lack of relationship between two or more metrics and whether or not there's an implied cause and effect, such as, does hire marketing spin drive higher revenue? And lastly is this showing the individual components of a bigger total, like how much profit does each month contribute to our total annual profit? What you might notice is that there's a bit of overlap between some of these business questions. If you're looking at a time series data, for example, on the monthly profitability of your company. You might consider that to be a comparison question or a composition question. How you ultimately decide how to plot this data really depends on the business and size that you want to convey. So if you want to make a statement like our most profitable month IS 5X, that of our least profitable month that you might want to consider plots that highlight this comparison. However, if instead you want to make a statement like 50% of our annual profit comes from the time period between Black Friday and Christmas. Then you might want to show this data as a composition. In the next section, we're gonna talk about comparisons. 2. Section 1: Comparisons: Section one, comparisons. When it comes to making comparisons, your bread and butter is always going to be line charts and bar charts. Probably about 70% of what you'll use is one of these two. Lunchrooms are always going to be your default option for any time series data. And what I mean by time series is that the X axis going across is some sort of a time dimension. And one of the reasons why line charts are so ideal is that there's just very little chart clutter, meaning there's very minimal amount of ink necessary to convey the full amount of information. Therefore, you can fit a lot more on a single page. And actually they're very easy to understand even for a non-technical ions. So this should be able to read it and interpret it without your help. Bar charts will be your graphic of choice when it comes to comparing discrete variables. And we'll talk about what that means. And like the line chart, it's also very easy to understand. But there's a bit more chart clutter in terms of the color of the bars. Meaning you can't fit as many variables on a single page as compared to the line graph. Let's take a look at a few examples. In this case, we are plotting revenue by month with the revenue on the y-axis and the months going across on the x axis. In this case, we can use either a line graph or a bar graph. But my preference is always to use the Line graph because your eyes are naturally focus on the part of the graph that actually matters are the markers. Whereas in the bar graph, your eyes are more focused on the color of the bars, which doesn't convey any additional information beyond what's conveyed by the height of the bars. So in other words, more chart clutter. In this next example, we're comparing the same revenue metric but across different geographies. And in this case we can't use the Line graph because in the Line graph the lines between the markers imply continuity. And so it's only useful for continuous variables, not discrete ones, like the states. I mean, if you plotted like this, your audience's still good to understand what you're trying to convey, but technically it's not correct to use a line graph when the variable is a discrete variable. Another consideration is how many variables you want to compare on a single graph. So if we increase this to two graphs, they're both, okay. You can still see the pattern of the seasonality of revenue between both graphs. But when you increase this to five, it gets a bit more crowded. And here the line graph have an advantage over the bar graph. In this graph, you can see that the first thing that will catch my eye is the fact that the green line is always above the red line for every single month, except for the month of April. And now this is just synthetic data. So I don't know if this is meaningful and something that you want to call out, but it's just something that's very clear to the audience when they're looking at the top graph. And details like this are a bit more obscured for the bottom graph. Now when you increase this to 20 variables, both chars actually look terrible. You should probably think about whether you actually need to compare all 20 like this, or whether you can find a better way to present this. But in the Line graph, you can still kind of tease out what's happening. But it's almost impossible for the bar graph underneath. So this is just something to consider when you're trying to choose between a line graph and a bar graph. One last example is when it comes to bar graphs, you can use them vertically or horizontally. The main consideration here is just helping your audience Read the axis labels more easily. So if you have very little set of names on the x-axis and it's longer than the labels on the y-axis. It might make sense to rotate it so that your audience can read it without tilting their next. Okay, so a little bit of a case study. So let's say you're supporting a retailer in the electronics space. You compile some sales data on their top for brands. And there are four product categories. And you want to create a graphic that helps them compare these segments. So how would you do that? And by the way, this is an example where the table is actually pretty good in terms of clarity and simplicity. So it's definitely an option that go with for this example. Okay, well, the first thing you need to decide is whether it's a summarize it based on the product categories and layer on top of that, the brands as the different colors. Or to summarize it as brands and make the product categories be the colors. And the other decision that you need to make is whether to keep it absolute, to highlight the actual dollar differences, orthonormalize it to a 100% in order to highlight the differences in the composition. In reality, you'll talk to the user and you'll find out which one of these actually provide the more useful and actionable insights for the decisions that they're trying to make, or whether it's necessary to include all four of these plots if each of them provide a necessary insight. But there's also another option which is to use a very wide Char, also known as a merry medical chart. This is basically a 2D chart that allows you to put both the brand and a product category dimensions on the same graph. And another advantage that it has is that it focuses the attention of the audience on the largest segment. So in this case, the immediate insight that I get is that the phones represent the largest product category. And within phones, Apple dominates with a 60% market share, while tablets are the smallest product category. And within tablets, it's an even split between the four brands. So this one chart not only allows you to include all the insights from the previous four charts, it also gives you the full relationship between brands and product categories, which you don't get when you plot them separately. The only thing missing is the total dollar values, the absolute comparisons. But usually what folks do is to include the grand total dollar amount in the title or the footnote so that the audience can still have that context if they need it. Here's another example of a very wide chart, and this time without any normalization on the two axis. Here, what we're doing is we're plotting this CO2 emissions per capita on the y-axis. And per capita just means per person. And we have the total population of each country on the x axis. And light before our eyes are always going to be drawn to what's the largest segment on the page that takes up the most amount of space. So in this case, it is the emissions from the United States and from China. The main inside that I get is that China is the largest producer of CO2 emissions based on the area of the graph. But on a per capita basis, it's middle of the pack and it's not as high as the other developed nations, like the United States. Now if we were to plot this data using a regular barplot, you have a choice. You can either plot total emissions, in which case China will be shown as the largest emissions contributor. Or you can plot per capita emissions, in which case, China doesn't look as bad as the other countries. And you might actually want to have that choice if you have an agenda to make about one of these points or the other. But this goes back to my earlier comment about objectivity. When you're plotting the data using the very wide chart. You're able to give your audience that full context and allow them to make the full judgment on that context rather than use steering them towards one view or the other view. And I really like this example and as my only non-business example and this entire course. But let's take a look at an actual business example. So here we're taking the very same very wide chart and we're plotting our own unit margins on the y axis and the unit sales on the excesses. So this is basically how much gross margin that we generate per unit of product that we sell. And this is how many units did we sell? If you have data like this, you have three different ways to plot it and three different ways to rank these products. If you plot total unit sales, you would have product i as the first product followed by product o. If you plot total gross margin, you will still have product i as the first one, but then it's followed by product B. And the third way, which I'm not showing, if you plot the gross margin per unit, you would get the products in the same order as this very wide chart. Now, each one of these plaza gives you a slightly different insight and a slightly different ranking of the products. Or you can simply show the whole context by making it the very wide chart where the business insights or that product I has generated the most amount of total gross margins. And it achieved this by selling a very large number of units. Or product B has also generated a very large amount of gross margin. And it has achieved this by taking a very high margin per unit. Okay, so just to summarize, very wide chars, they're essentially a bar chart with no spaces in between. And light bar charts, you can normalize or not normalized the axis. You have two sets of actors, which means you can simultaneously showed two-dimensions, which is a great advantage because it allows you to provide more contacts to the audience and reduce the bias that comes from just showing one of these two dimensions. The only disadvantage is that it's not very common. Plus, so folks are not familiar with this chart, might not be able to interpret it very easily or very quickly without some help and explanation from you to orientate themselves. One more example of comparison that I've just included for the sake of completion, but it's one that I'm not a huge fan of and that is radar charts, also known as spiderweb charts. What we're plotting here is the customer satisfaction results, which includes some dimensions like price, quality, selection and where we're comparing these two results for two stores that are located in Boston. And one thing I don't like about this is the fact that your eyes have to track back and forth quite a bit in order to read this graph, especially if you are comparing across eight different metrics. And it also looks kinda weird if you're comparing across four metrics because then IT, because just like a square, the only thing I don't like about this is that it might look fine. We have the two stores are very different to each other, so there's a very high amount of contrast, but it looks kinda ugly when there's a lot of overlap between the stores or if you have a lot of different stories that you're trying to compare, you do have the option of plotting each store separately. And again, to me, It's okay if the stores that you're comparing are very different to each other, like in this case, I can see that Boston South, for example, has very decent scores across everything, especially in terms of price. And I can see that Dorchester, in contrast, has very poor scores and everything except for services. When it comes to the other stores, the radar charges doesn't provide very clear comparisons without the audience having to do a lot of work eyeballing back and forth. This is it for comparisons. In the next section we're gonna talk about distributions. 3. Section 2: Distributions: Section two, distributions. When it comes to showing distribution that your basic two options are gonna be histograms, scatter plots, and how you pick between them as basically coming down to whether your distribution is across a single dimension or two-dimensions. And histograms are a great option. One, it generally it's easier to read by your audience and to, because you're basically doing the analysis on the data before you plot it. It is unaffected by the size of your dataset. Now think back to that line graph that we'd looked at where when we're comparing two lines, it's okay but it gets complicated very quickly and we have 20. But because you're doing this greenhouses, you also have some choices to make, mainly in terms of how you want to define your buckets. Scatterplots was also great in terms of easy to understand and having minimal chart clutter. You can use it if you're trying to show a distribution across two-dimensions. But the insights are not as clear as you can display for a histogram because you're not explicitly defining what these buckets are. Another thing to watch out for, for both of these two types of graph is the influences from Outliers, which can't be a pro or a con in case of scatterplots. And I'll show you what I mean in a few examples. Here's an example of a histogram where we're counting how many of our customers fall into each of these four year age brackets. And you can see that we have about 1000 customers in the youngest 15 to 19 age range, and that ramps up to 2500 customers in the 40 to 44 age range. What I mean by this second bullet here is that even if we collect more data on this dataset and we have ten times more customers to complexity of the graph stays the same. You're still going to have the same buckets unless maybe you're adding an extra bucket for those under 15 are those that are over 79. But generally the complexity of the graph doesn't scale with more datapoints because we're doing this summary analysis on the data to create these buckets. You do have a decision to make in terms of whether you want to normalize it to a 100%, which is very commonly done since most of the time when you have a distribution business question, you're thinking about it in terms of percentages. And in this graph, we can see that about 50% of our customers are between the ages of 3254 and roughly 8.5% of our customers are over the age of 70. And here's a different graph where we're trying to characterize how many of our skews and skews are basically stock keeping units, which you can think of as a unique product. And do we currently have how many of these quantities on-hand? One of the things that you need to think about is in terms of how you want to define the ranges for these individual segments, which I've been calling buckets. In the default, you take the highest data point that you have in this range and you divided evenly among the different segments you can fit on the graph. But what you can see in this example is that a very tiny number skews where we have really, really hide unit count, really switches everything else into a single bucket. And this isn't as useful because you're basically saying 99% of your skews, you'd have somewhere between 0 to 500 units on hand. So what you can't do after you take a look at the actual data is to customize the ranges, the buckets, to allow the audience to see a clearer picture of the shape of this distribution. And you don't really need to make every bucket the same size. Here we can see is that about a third of our skews, we have 0 units on hand. Now, is that intentional or do we just make a poor job of keeping things in stock? I don't know. About 45% of our skews, we have somewhere between one to 20 units on hand. And this is still a pretty big segment. You might want to consider breaking this up into smaller segments to show more detail. And lastly, all that stuff on the long tail that we saw in the first graph, there's about 600 skews where we hold more than 5 thousand units on hand. And the graph on the right is just more useful than the one on the left because it provides more granularity around a part of the dataset that actually matters. And adjusting the cutoff for the segments is something that you have to make a decision on what you're trying to make your histogram more effective and less impacted by these small number of outliers. Moving on to scatterplots. So scatterplots are used if your distribution is across two different dimensions instead of just one. And in this case we're plotting dollar sales on the Y axis and unit sales on the x-axis. And basically this is how much money did we make versus how many units that we sell, where each circle represents one product or one skew. So even though the scatterplot shows everything and one of the disadvantages compared to histograms is that it isn't explicitly defining the segments. So usually you have to call that out in your description or you're talking point. So in this case, I would say that most of the products produced somewhere between 0 to $4 million in sales. And there's two major outliers that produce around $18 million. And the products are sold between 0 to a 100 thousand units with one major alle are at 350 units. And one thing that I'll point out in terms of aesthetics is that you can also make your markers be either non-filtered or partially transparent so that you can display the density better on a crowded chart. Now, just like the histograms, scatter plots are heavily influenced by outliers. And I'll put that as both a pro and con sends. It really depends on whether or not these outliers or the meaningful things that you want to call out. So here I took the previous example and I just added a few more data points. So namely on this one on the right, there is a product that brought in about $11 million in sales and F sold about 1.8 million in units. And this is a synthetic dataset. So I don't know whether or not this product and this behavior is important or not. But what you can see is that it basically squished everything into this corner and created a lot of whitespace on this graph. Now this is great if this data point is what you want to talk about, because this is exactly where the eyes of your audience are naturally going to be focused on. But if this is not the most important thing that you want to call out, then you might want to leave out this outlier and just make it a note in the footnote so that you can instead zoom in, in this area that has most of the data that's actually present. One last note about scatter plots is that the other decision you need to make is whether to actually, it makes more sense to make it two distinct histograms if the relationship between the two dimensions is not that important in terms of your business insight. So you could show us separately like this, making two separate histograms. And it might actually be clear for your audience to understand. But what you do lose is this relationship between the two dimensions. Okay, how about a more complicated example? So let's say you want to compare the 52 weeks of weekly sales for a different set of product categories. So in this case for tools, what you see is that you have 3252 weeks in the year where the sales are somewhere in between 0 to $200 thousand per week. And the other 20 weeks where the total sales are between $200,400 thousand. And likewise, you can make the same histogram for the other product categories. And this is one way to go. If you just have five categories, you can make these five individual histograms and just put them side-by-side. But you can imagine this might get a bit more complicated if you have a lot more product categories that you want to compare to each other. So another option is to summarize each product category in terms of a boxplot. And this is typically what a boss fight it looks like. And it's also sometimes called a box and whisker plot based on the shape. And it's similar to a histogram. You can basically define the segments, but instead of defining it based on your own judgement for a boxplot, you're basically defining it in terms of the quartiles. So this is the bottom 25% of your data points. This is the next 25% and the next 25%, and this is the top 25% of your overall datapoints. This middle box is called the inter-quartile range, which holds the middle 50% of your data points, and the line defines the median. Now, a lot of people also define the ends of it, not as the maximum and the minimum, but as the 95 and the five percentile. And this basically helps make the whiskers Not as long. You have really, really extreme outliers. So what would this look like in our previous example? Well, this is a much more compact display that allows to make a more direct side-by-side comparison of multiple distributions. And in terms of the immediate insight, once I re-orientate myself in terms of how to read the scrap, I can see that the backpacks have the highest median weekly sales throughout the year. The tools are the least seasonal product category is weekly sales fall within a very narrow, consistent range. Well, skis or the most seasonal category with almost no sales during half of the year. And lastly, during the peak season, outerwear has the highest weekly sales out of all the categories. Now this type of plot is loved by the statistics community and they use a lot. It's great for showing multiple sets of 1D distributions. And like the histogram, you're also doing an analysis on the dataset first before you apply it. Which means that the complexity to grab doesn't scale as you increase the size of your dataset. The only downside is that it's also not a very common plot outside of the statistician community, which means that you might need to provide some accompany and explanation to make sure your non-technical audience kinda understand the insights. This is for our distributions. And in the next section we're gonna talk about relationships bound to compound bow. 4. Section 3: Relationships: Section three, relationships. For showing relationships, we're going to revisit the scatterplot and it's informationally more dense counterpart, the bubble plot. When we discussed scatterplots and trends of showing distributions across two-dimensions. We said that we can replace it with two separate histograms if the relationship between these two dimensions are not that important. So here in this case, the focus will be that relationship. So the business insights will be whether or not a relationship exists. And again, two of the great advantages to scatter plots is that one, it's easy and understand and 2D is a very high ratio of information to ink. Bubble plots are the same except with an additional dimension added in terms of the size of the markers. Plus more if you really want to cram it in, but it's not recommended to trade off is that the added dimensions also add extra complexity which you'll have to manage and decide whether or not the business question really necessitate there's an extra complexity. And more often than not, you can find better and clearer alternatives to communicating it. In this example, we're plotting the number of units sold versus the average price, where each marker again represents a particular skew or product. And human beings are hard wired to recognize patterns. So when you show a plot like this, and especially if you include a regression line through it, you're essentially saying not only is there a relationship between these two variables, but you're also implying a causal relationship. Now in tourism analytics, causality is not easy to prove. And so usually when there's a potential causal relationship with folks do is they tend to use their intuition to decide whether or not that causality is plausible. So in this case, you're either saying that when you set a lower price for your product, you're more likely to sell more of it. Or if you're able to sell a product and bulk, you're able to provide your end customers with a more competitive price. Both of these two scenarios seem pretty plausible in real life. So you can show this relationship as your businessman site. And you may also use this regression line as a way to model this relationship, but always have that gut check to make sure that that relationship is plausible. One last point is that if you're using the scatterplot, you generally want to make the independent variable B on the x axis and the dependent variable on the y axis. Which means if there is an implied cause and effect, you want the cost to be on the x-axis and the effect to be on the y-axis. Here's another example where we're looking at promotional responsiveness of different customers by looking at the relationship between the percent increase in sales that we achieve with each customer versus the amount of promotional discount that we've offered them. And in terms of these customers, we have a blue cohort and a red cohort that seem to behave very differently to each other for whatever reason. For the blue color, you can see that when you increase your discount from 0 to 40%, you're able to achieve and on average, a lift of about a 160%. Whereas among the customers in the red cohort, the overall lift is much lower and more varied. So you might say the blue cohorts seemed to be more promotion, really sensitive. And the red Cold War. And by the way, all the data that I'm using in the examples in this course, our synthetic data, meaning I either treated it from scratch or I took some data from my company and I modified it to anonymize it. In real life, I've never seen promotional responsiveness data that looks as clear as what's shown here. So moving on to bubble charts, here's an example where we're assessing the brands in terms of their price point and their level of innovation, which is something that you might think about if you work in merchandising. So each circle is a brand and the size of the circle now represents the size of the brand in terms of their overall sales. So comparing the previous example, this now incorporates a third continuous variable as the size of the markers. And what that does is that it allows us to make observations like this one, that the majority of the sales are generated by lower priced products. Like if you divide the brand's between the middle line, there are more circles and bigger circles on the bottom half compared to the top half. In terms of relationships, you'll see that there's also more bubbles in the diagonal two squares than in these two squares. Which means that perhaps customers are more willing to pay higher prices for more innovative products. And lastly, there's a fourth variable here in terms of the North American versus the international brands. And we can point out here that the international brands tend to be the more expensive and more innovative brands, and they all sit in this top right hand corner. Here's another example of a bubble plot just to showcase the fact that you don't necessarily need to make the x and y axes be the continuous variables. Sometimes it actually works pretty well also with discrete variables. This one we're looking at the reasons and the quantities of returns across different product categories. Now returns are becoming a bigger deal for a lot of retailers as they're growing their business. Because e-comm segments tend to have a higher rate of returns compared to in-store purchases. So it's important to understand the reasons for these returns and whether these reasons can be addressed. So from this graph, the focus of your eyes will naturally be towards two bigger circles, which is great because they represent the biggest problem areas. And what can't be interpreted is that our biggest opportunity to reduce returns might be introduced, improving the fit decisions of our customers when it comes to footwear and women's apparel. In addition, some of our customers are also finding better prices elsewhere when it comes to skis and instruments. So this might drive some pricing decisions for our marketing department to run some sales or promotions. And maybe there's also some additional issues when it comes to product identification among apparel from our warehouse where they're picking and packaging the products incorrectly. And lastly, I went online, I try to find the most ridiculous example of how many dimensions you can fit on a single bubble chart. And this is what I found. So there are for continuous variables and two discrete ones on this single plot. And obviously this is not ideal. And this is only, this only kind of works because none of the circles are overlapping with each other. And the real question here is, does the audience really need to simultaneously view this many variables? Or would it be clearer if you simply split the information across multiple plots? So even in the age of tableau and Power BI, where you have the option of presenting a lot of information upfront and allow the user to click and drill down. This sort of graphic is still not recommended. This is if a relationships. In the next section we're going to talk about compositions. 5. Section 4: Compositions: Section four compositions. So for compositions we have our good old pie chart and the stacked bar chart. Pie charts Get a lot hate, but they're actually pretty good in terms of being easy to understand and there are definitely effective. For a quick summary, you have two data is very simple. I think the main criticism for using pie charts is that when you have it a lot of different segments of different sizes, you can't really tell their relative differences very easily compared to a simple bar chart. Nevertheless, you probably do want to avoid 3D exploded pie charts as having something like this in your part probably will diminish your credibility among the analytics community. Stacked bar charts are another common staple, again, mainly because they're easy to understand, but there's definitely a lot more sophisticated options available for this section as well, which we'll cover. First example. So we are plotting the split between sales at regular price versus sales at discounted promo price. And we made this a stacked bar chart and normalize it to entre percent so that we can focus more on the differences in composition and how that changes over the years. But in doing so, we're leaving out a lot of context in terms of how the actual sales dollars are changing across these years. Which means there are two potential scenarios that warned very different responses. Once an arrow might be that our total sales have been relatively flat over the last ten years. And a bigger and bigger portion of our total sales is shifting from regular sales to promo sales. And a situation like this, what would be a little bit worrisome because it means our promotions are not generating any incremental sales. They are simply cannibalizing our regular sales. However, it can also be another scenario where our regular cells are actually holding steady and we're actually increasing the incremental growth of our total sales due to the growth in promotions. So two very different scenarios underneath. And we can't really tell which one is which when we normalize the plot 200% in the initial normalized stack bar chart. So here's another example. So let's say you are in charge of customer experience and periodically you send them surveys on how they rank their satisfaction on this one to ten scale. A very common metric that retail stores like to track is something called a Net Promoter Score, where you take the difference between your percentage of promoters and your percentage of detractors and what you're left with is usually interpreted as the likelihood of your current customers to recommend your products or your services to others. Now, like the previous example, where you're calculating a summary metric and plotting that, what you're really doing is you're optimizing for simplicity, but you're trading off a little bit more of that context. So in this case, you take the NPS scores for these eight stores labeled by their city names and you get this ranking. And whenever you have this single metric ranking, the graphic is always very clean and very easy to understand. The best is Fort Worth and the worst soar is Los Angeles. But there might be a few details that are hidden behind this simplicity. So if you're looking at the breakdown of the individual component is, what you might see is that Philly and New York are a bit of an outlier. The fairly store for whatever reason it is very polarizing. It actually has the highest number of detractors, but it also has a very high number of promoters. So it would be interesting to dig into this and understand why is this such a polarized store. And conversely, New York doesn't really have that many detractors, but most people are pretty neutral and they have a very, very low number of promoters to balance out that the trackers. So again, it will be interesting to understand wide. So if your goal is to understand the root cause of the detractors, then the overall NPS score might not be the best metric to focus on because you don't want your number of promoters to hide your number of detractors because it's the number to tractors that are most actionable for your audience, for this particular business decision. So what you might do instead is to plot the individual components in a stacked bar chart. But you can put more emphasis on the detractors by one, making them BY a negative metric separate from the other two values. And two by ranking the stores based on this metric. So in this case, you might end up having New York store being in the middle of the pack into Philly store on the bottom of the pack. If you're just ranking them based on the detractors score. You can also layer on top of this MPS score by overlaying the NPS score as the circle markers. Ex example we're talking about what a chart diagrams. So Waterford diagrams is another option for showing the individual components of a total. In this case, we're showing the monthly revenue and how they add up to the total annual revenue. One advantage of this plot is that for each month, you also get to see the cumulative total of all the previous months. But the trade-off here is that it makes it harder to compare the revenues between each month. And if that is the more important insight, then it might make sense to plot it as a regular bar plot or a line plot to better showed this comparison. However, waterfalls are actually very useful when your metric has both positive and negative numbers because the overall height of the graph doesn't get compressed as much relative to the bigger overall total. And this is why as more commonly used for financial metrics like cashflow or profit. So in this case, each bar is a little bit bigger and the cumulative total is also a bit more useful because you can see, for example, it is at this point in the month of August that the company achieved positive cashflow for the rest of that year. And this is something you can't tell as easily on a regular bar chart. One final example before we wrap up this section. So let's say you're given a dataset that has this year sales compared to last year sales. The difference between this year and last year and the percent difference. And this is segmented by product categories. And let's say this is a pretty long list of a 100 different product categories. And the goal here is to highlight What's been driving our overall business and where we should focus for next year. Well, I do want to point out that this is another case where this table is actually pretty good for showing these numbers. And aside from the fact that this is a pretty long table with a 100 different categories. You can still sort on these four metrics and look at the top and the bottom of that list. But if you are inclined to plot this out, you have a few options to consider. And this is actually an interview question. And I say about 50% of the candidates that have interviewed tend to pick option number one, which is to ignore the derived difference metrics and focus on just a $2 values that represent this year versus last year. And plot them side-by-side so that the audience can assess the differences. And the other half of the candidates tend to pick either option two or option number three, where you're ignoring the underlying sales data and highlighting instead the categories that have the biggest or the smallest dollar differences between this year and last year. And one thing you might notice is that here I'm only showing the top and bottom categories, and I'm not showing that entire list of 100 categories. So for all three of these graphs, we haven't really addressed the issue that we have a very, very long list. The second thing you might notice is that these two lists are actually very different from each other, even though they're both showing the top winners in a bottom losers, they paint a very different picture in terms of who the winners and losers are. So which one of these is more useful in terms of assessing the overall health of the business. That's one of the questions we've had to answer. So one way to address both of these two points is to use something called a tree map, where the common setup is to make the size of the square B this year sales for each category and happened be colorized by the percent change for last year. The tree map algorithm automatically puts all your squares from the biggest to the smallest, from the top left to the bottom right. And this really helps to address the problem with very long tails because the smallest segments will automatically take up less space and therefore less of your audience attention. The second thing that addresses is the fact that it's usually the smaller segments that are often the biggest outliers when it comes to percent changes. So this also helps drive the audience's prioritization by pushing their attention towards the bigger squares that are more highly colorized as those are the segment that actually have the biggest impact. And in terms of driving the overall business, not the really small ones on the right side corner, no matter how big those percent changes are. So in this case, my biggest takeaway is that the most significant area that we are outperforming this year, our travel backpacks, kitchen, backcountry footwear. And where we've really underperformed is in areas like bicycles, lighting, and women's light, women's lifestyle outerwear. And these guys here, even though they're highly colorized and have massive percent changes compared to last year. They might warn more attention in other parts of the dashboard in terms of growing a small segment. But they're arguably less relevant in terms of providing an overall view of what's really driving the business the past year. Going back to our previous plot, you can see that option two, as she aligns much better with this TreeMap because it's showing the product categories that have the biggest dollar change compared to last year, whereas most of the ones that are here and the option number three, they're not labeled on the tree map because they tend to be the really small ones that end up on the bottom right hand corner. So treemaps were a grade for comparing two sets of variables, especially if one of those variables is one direction old like sales, and the other one is bidirectional like changes in sales. And this example really highlights the common ways that I've seen treemaps used in dashboards and especially for retail companies. In terms of advantages, it really highlights what's important in how hide the long tail of small items that don't drive the overall business. In terms of negatives. I think by now you're starting to see the pattern that a lot of these less common, less conventional plots might require a bit more explanation and effort on the part of the audience to understand and to read them. But once they're familiar, they actually offer a lot of advantages over the more conventional line charts and bar charts. So this is the end for compositions and you're done in terms of selecting the best way to plot a dataset. And in the next session, what we're going to cover are miscellaneous things when it comes to treating and effective dashboard. 6. Section 5: Context, Clutter, Color: Section five, context, clutter and colour. So a lot of times the context that you provide your audience is as important as the actual information that you're presenting. So here's an example that I've encountered many, many years ago, and you can see that it's a little bit dated, but I've used this example a lot of times to make this demonstration. So I was in my car listening to the news on the radio. And the newscaster says, today in the stock market, the Dow Jones Industrial Average went down by a 102 points while the nasdaq was down 35 points. So I thought, okay, so sounds like it was a pretty bad day for the dao, but the companies that are listed in an Aztec didn't do so bad. But what I didn't know was the context that Adele went down from 8,281 points to 8,179 points, which is a drop of 1.2%, while the nasdaq, as she dropped 2.2%. So to an investor, this percentage change actually matters a lot more. So it was actually a lot worse for the nasdaq than the Dow once you have this context. But there's also another piece of the contest, which is that the nasdaq being full of tech stocks are inherently more volatile than the doubt. So a plus or minus 2% of swing isn't really out of the ordinary. While the Dow dropping 1.2% on a single day actually was a pretty big deal at that time. So the point here is that the context may not just have a small effect on the interpretation of the insight, it can actually completely reverse the interpretation. Here's another example that's kinda based on our previous look at treemap. So we have two product categories at a retail company, men's and women's apparel. And if you just look at the year-over-year percent change in sales, you might conclude that the men's apparel is as driving most of the growth for the overall business, assuming that both of these two categories are roughly the same size. However, if you had the context of how this percentage is calculated, your audience will see that the Mensa Pearl had a much larger percent change on the basis of a much lower baseline sales from the previous year. Which again is a very common behavior on sales in that you're smaller categories will have much bigger percent changes. And it really is the larger of women's apparel here that's driving the actual tangible dollar growth. Even though as a percentage of last year, it doesn't look as dramatic. Here's another one on contracts. So here we're looking at two very common supply chain warehouse metrics, Dr. stock, which is how much time it takes the warehouse to receive and put away you merchandise that it has just received from the vendor and pick the ship, which is how much time it takes to warehouse to grab and package a product that they want to ship to a customer. So if you're a supply chain has for you might look at these two dates and you have a rough idea of whether these are good or bad or about average. But to someone who is not an expert, there really isn't a whole lot of contexts here other than maybe comparing these two numbers to each other and wondering why picked a ship takes almost twice as long as Dr. stock. So in order to provide more context, we have a few different options. We can compare these two numbers to historical trends, such as what was the metrics from last week? Or if this was a really seasonal metric, what was the comparison to the same time period from last year? We can also compare them to other benchmarks, such as the number of days. In other more comparable warehouses or what is considered to be the industry standard for these metrics. We can also compare them simply to their goals and their targets. So if we set a target of today's Dutch stock than we are well ahead of our target. But if our target is one day, then we are way behind. So the context can be as simple as that. And again, going back to the very beginning, our goal is to allow our audience to quickly on unequivocally identified the main insights that are going to help them with their business decisions. One way to do that is to provide them with the right contexts that highlights whenever anything is outside of normal. Another topic that I want to touch upon is charged clutter, which I mentioned a few times throughout this course. So here's an example that I found on the internet where we have a very colorized bar chart that shows the annual rate of inflation in grocery stores across different cities. And this is a bit of a straw man because you usually don't see graphs that are this bad in terms of clutter. But to clean it up, I think we can apply some of the best practices that we've covered, such as one, making the city names easier to read by having to bars be horizontal, to getting rid of the 3D color into bars, and getting rid of the colors altogether because they don't really convey any additional information that isn't already conveyed by the length of the bars. And lastly, it's also good practice to make the order of the bars also be in a meaningful order. Now there's also a man named Edward Tufty who is kind of a guru when it comes to data visualization. And he caused a lot of books on this topic. And one of the things that he really focuses on more than anybody else is just how far you can go when it comes to minimizing chart cluttered. So what he might prefer is something more like this, where you'd gotten rid of all of your axes altogether, including the grid lines and the outline of the bar charts. And instead you're just showing the length of the bars as subtracting color for the percentage marks. His philosophy is that you want to show the least amount of ink on the page so that your audience can focus their attention on just the ink that actually plays a critical role and conveying information. Okay, next topic is going to be a brief one on selecting the color scheme. So there's a website called ColorBrewer, which helps you pick the most optimal set of RGB codes for colors that will offer the best kind of contrast and under different scenarios. So here, for example, are the default colors from the earlier versions of Microsoft Office. And here's what color brew would recommend in terms of maximizing that contrast when you have a five color graphic. And this is probably becoming a little bit less relevant now because most conference rooms now are equipped with flat-screen TVs, which offer much better contrast. But back in the days when you are going to give a presentation at your client's office, you don't know what the setting is going to be like and whether or not they're going to have a really crappy quality overhead projector in a room that's too brightly lit, then in this case, you want to make sure your audience can still read the colors of your graph comfortably by making sure that you maximize the contrast. The last topic is just on some boldness material on drill-down features. So this is going to be a little bit suffer specific, which is not the intent of the course. Not an exhaustive list. This is just one that I want to show to try and make the point that fantastic dashboard building tools like Tableau and PowerPoint, you have the option of not fitting everything on the same page, but you can make it interactive so that the audience can get the overall picture and then click down to drill down on the details that they're interested in. So the first one is sparklines. And sparklines are basically just a Minecraft. And Apple does this beautifully in their stock cap by showing you a tiny little graphic for each stock and how that has trended throughout the day. And then if you were intrigued by this, you can always drill down deeper, click on one of these stocks and get more details on the actual trends going back further than one day. Another great example is the mouse rollover pop-up, which are called tooltips in tableau. So this is a map that's been colorized by a particular metric. And then if a particular area of the map interests you, then you can put your mouse over that area. And I'll give you a pop-up where you can put a lot more details, including in an entire other graphic. One additional comment that I'll add is that even though this is a course on data visualization and dashboard design, I realize I haven't really talked about maps all that much. Even though I'm quite surprised at how prevalent maps are in dashboard. And this is because my opinion is that unless the business question you are trying to address is specifically related to geographies. Like if you're building a dashboard in terms of logistics, maps don't really add any real value or insight. All it does is it cramps details into the major population centers and leaves a lot of whitespace everywhere else. But I will admit that they are very popular and the general audience tend to love Maps just because they look really cool. One last example is a toggle is that you can use to change to over metric that you use across an entire page in Power BI, This is a more advanced application of buttons and bookmarks. So in this example, what we have is that you have the audience that might prefer looking at the same set of numbers but under different currencies. So if the audience doesn't need to make any comparisons between these currencies, then it is a best practice to just show one of these three sets of numbers at a time using this button to allow the audience to toggle back and forth, rather than tripled the amount of information to page by showing all three currencies. So the overall theme between these three features that I want to feature is just to allow you to hide certain information in front of main view of your dashboard, which then makes it look a lot cleaner, but still allow the audience to drill down on more details in the areas that they're interested in. Okay, so that's the end for this section. And in the next session we'll give a quick wrap-up. 7. Conclusion: Conclusion. Thank you very much for making it through this mini-course, we've segmented the topic of data visualization into these four sections, which I will credit Professor Andrew Abella, we looked at the pros and cons of the more basic and common ways to plot our data to address common business questions. We also took a look at some of these more advanced options and in what situations is a worthwhile to trade off the simplicity in favor of something that is more rich in information. While Keep in mind that our goal is to communicate our business insights effectively and objectively. In terms of the process for creating a new dashboard and you rapport or an ad-hoc analysis. We focus our time mostly on step number two. But I do have to emphasize again that your efforts to apply what we've covered in step number two will be greatly hampered if you don't go through a very thorough job of covering step number one. And in my experience, one of the biggest criticisms for folks who are highly trained and highly proficient on the technical aspects of analytics is that they don't fully understand the business questions and the business needs. And therefore they either create a very inferior product or they fumbled through the necessary change management to ensure that the dashboard they create gets used and adapted. This is the first course that I've put together. So any feedback on the content or the delivery will be very much appreciated. I am not trying to fish for compliments, but any constructive criticism will be very useful for me. Including whether or not you'd be interested in any of these other topics that I've considered, which include the step number one, understanding the business question, demand planning and forecasting. How to create an effective presentation and PowerPoint slides, which is the complementary component of storytelling. And lastly, the list of case studies involving the most common types of data analytics models used in business world, like Basket Analysis, product segmentation, store clustering, marketing. All right, so thank you again and congratulations on finishing this course. I hope you found it useful.