Machine Learning for Absolute Beginners - Level 3 | Idan Gabrieli | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Machine Learning for Absolute Beginners - Level 3

teacher avatar Idan Gabrieli, Pre-sales Manager | Cloud and AI Expert

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

37 Lessons (3h 3m)
    • 1. ML Level 3 Promo

    • 2. Section 01 - Welcome!

    • 3. Our Overall Learning Path

    • 4. How to Practice?

    • 5. Section 02 - Data Visualization with Matplotlib and Seaborn

    • 6. Matplotlib – Overview

    • 7. Matplotlib – Figures, Axes

    • 8. Matplotlib – The OO and Pyplot Interfaces

    • 9. Matplotlib – APIs Reference Review

    • 10. Seaborn – Overview

    • 11. Seaborn – Figure and Axes-level Functions

    • 12. Seaborn - Chart Customization

    • 13. Seaborn – API Reference Review

    • 14. A little bit about NumPy

    • 15. The Right Chart for the Right Job

    • 16. Section 03 - Ranking and Proportion Charts

    • 17. Bar Chart

    • 18. Grouped Bar Chart

    • 19. Lollipop Chart

    • 20. Stacked Bar Chart

    • 21. Pie Chart

    • 22. Treemap

    • 23. Optimizing Colors

    • 24. Section 04 - Trend and Distribution Charts

    • 25. Line Chart

    • 26. Area Chart

    • 27. Stacked Area Chart

    • 28. Histogram Chart

    • 29. Density Curve Chart

    • 30. Box and Whisker Chart

    • 31. Bee-swarm Chart

    • 32. Section 05 - Correlation Charts

    • 33. Scatter Chart

    • 34. Correlogram

    • 35. Heatmap

    • 36. Hexbin Map

    • 37. Let’s Recap and Thank You!

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class


Unleash the Power of ML

Machine Learning is one of the most exciting fields in the hi-tech industry, gaining momentum in various applications. Companies are looking for data scientists, data engineers, and ML experts to develop products, features, and projects that will help them unleash the power of machine learning. As a result, a data scientist is one of the top ten wanted jobs worldwide!

Machine Learning for Absolute Beginners

The “Machine Learning for Absolute Beginners” training program is designed for beginners looking to understand the theoretical side of machine learning and to enter the practical side of data science. The training is divided into multiple levels, and each level is covering a group of related topics for a continuous step by step learning path.

Level 3 – Data Visualization with Matplotlib and Seaborn

The third course, as part of the training program, aims to help you to perform Exploratory Data Analysis (EDA) by visualizing a dataset using a variety of charts. You will learn the fundamentals of data visualization in Python using the well-known Matplotlib and Seaborn data science libraries, including:

  • Matplotlib fundamentals

  • Seaborn fundamentals

  • Selecting the right chart for the right job

  • Bar, Grouped Bar, Stacked Bar, Lollipop charts

  • Pie, Three-map charts

  • Line, Area, Stacked Area charts

  • Histogram, Density, Box-and-Whisker, Swarm charts

  • Scatter, Correlogram, Heatmap, Hexbin charts

Each section has a summary exercise as well as a complete solution to practice new knowledge.

The Game just Started!

Enroll in the training program and start your journey to become a data scientist!

Meet Your Teacher

Teacher Profile Image

Idan Gabrieli

Pre-sales Manager | Cloud and AI Expert


Class Ratings

Expectations Met?
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. ML Level 3 Promo: hi and welcome to this training program about machine learning. My name is he done, and I will be a teacher. Machine learning and the umbrella terms artificial intelligence are exciting and engaging topics gaining tremendous momentum every well. It's a mind shift on how to develop applications. Instead of using hard coded rules for performing something, we let the machine let things from data, decipher the complex Parton automatically and gain new knowledge. Companies are looking for ways to utilize those technologies in practical use. Case as features in products, physical products like our mobile phone on the village, your product like a recommendation system in a website. It's a game changing technology, and the game just started. The market demand for skip people is growing and as a result of data science community is becoming the hottest place in the high tech industry. Still, machine learning is a complex topic divided into many sub topics, and we can easily get closed while trying to figure out where to start and what kind of skills we should develop. This training program provides a comprehensive yet state forward and sequential landing pad for beginners. You can follow the complete learning pat step by step, All decide what levels are relevant for you. Level three is all about data visualization. Data visualization is a big part off any data science project we would like to explore and analyse our later using a variety of charts. This critical step is called exploratory data Analysis, and it will be done at the early stage off almost any project toe against some insights into the data. In this level, we will learn the fundamentals off data visualizations in pie tone, using the well known much plot Lieb and See Bone Data Science Libraries. How to Create and Customize a variety of charts that will be divided into categories will have ranking charts, proportion charts, trend charts, distribution charts and correlation charts. Sign up today and start to learn exciting concept of machine learning and data science. Thanks for watching, and I hope to see inside 2. Section 01 - Welcome!: hi and welcome. My name is he done and I'm going to be a teacher. We are about to start the terror level in the machine learning training program. This level, level three is all about data visualization. Data visualizations is a big part off any data science project. We would like to take the cleaned data set we optimized while using the tools we covered in level two and be able to explore and analyse our data using a variety of chance. This critical step is called exploratory data analysis and showed e d. A. And it will be done at the early stage off almost any project to gain some insights into the data. Trying to discover interesting and useful Parton's patterns, such as a cluster off data points, a linear or non linear correlation between features evolving trains off a time scale and much more. Any understanding we can uncover about data set will help us later on when selecting the right machine learning algorithm and optimizing the input features for modeling. Therefore, data visualizations is a core skill to master. We need to be able to understand the story behind the data set and in many cases also be able to present it in a meaningful way to other people. Now, how are we going to do it with Peyton? The good news is that pie tone as an extensive list of data science libraries for data visualizations, we can use those libraries to create nice looking charts with few lines of code. Some of those libraries provide a high level interface, even for performing some very complex a visualization task. The bad news is that sometimes when we have multiple options to select in that case multiple data visualization libraries, it can be a little bit confusing and overwhelming. It is how to navigate between the options and choose the right tool to perform a specific job. My suggestion and this is the way I created this level will be to focus on a small number off libraries that are very popular, well documented and will probably cover like 95% off. A Lee data visualization scenario will encounter. It will be a strong starting point without getting lost with too many options. To be more specific, my selected libraries of cord Aamodt plot lib and see bone, and we will learn how to use them doing this training. Welcome again. Toe my training. I would like to wish you exciting in useful learning. Please remember that you can ask me question using the course dashboard. I'm here to help you. If needed in the next lecture will talk about the overall training pat in this training. 3. Our Overall Learning Path: I would like to quickly review our landing pat until this point toe better understand. Where are we right now? And where are we going While moving forward in level one recovered the essential mission. Leveling terminologies what is supervised and unsupervised. Learning the type off machine learning algorithms and how they are divided at a high level . The meaning off a training data set, a trained model and much more. It was a soft and fund introduction to the topic off machine learning in level two, we started to move into the practical side. We saw how to set up the Jupiter lab development environment. We learned the basic relevant fight on syntax to kick off our data science programming capabilities. We then focused on loading, cleaning and transforming a row data set using the famous pan this library in level three. Meaning this level. We are planning to focus on the subject off data Visualization will start with the float Lib library, which is the cornerstone off many other data visualization libraries. It's a very popular and powerful library. The next one will be the sea bone library. See, Bone is a higher level interface compared to Matt put lib. Sometimes we just want to create a child quickly with minimum efforts. And See Bone is designed to provide a such capabilities. After gaining some high level knowledge about mud, Pro Clemency Board will move on and start to learn what kindof charts are available. What we can perform using those charts and, of course, how to create and customize those Charles to meet our visualization. It's okay, this is our high level landing path, and in the next lecture, let's talk a little bit about how to practice your skills. 4. How to Practice?: I'm sure you already know it, but I will say that is a minder. The way to become a great expert about the topic is by practicing, trying things, applying the new knowledge and gaining experience. If you think about it, it is a simple formula. Reviewing books, Toto Royals and courses like this one is very important step to gain knowledge. But it's not enough. We need to invest the extra mile in becoming an expert with solid experience. To be more practical. I have the following recommendations. I'm providing a summary exercise for each relevant section and also a solution for that exercise. You should use it to kick off your knowledge. It doesn't make sense to jump to the next section without investing the time to solve the summary exercise. This exercise is a quick way toe. Apply a small chunk off information we learned in a specific section. The next recommendation is to try different options. Play with the look and feel of those child's explode option in the official library's website, load other data sets and try toe, analyze them by yourself. In level two, we talked about the Kagle website, which is a great source off additional data sets. So download a different data set and practice practice and then practice a little bit more the Ted recommendation. And this is for really proactive students. If possible, try Toby involved in solving real data visualization problems. Even if you're not holding today position, they likely related data. Science data is everywhere. Just look around at your current walk. Some friend, private business, some organization you can volunteer, and so one. People are always looking for ways to organize and visualize data. You can help them, and by doing that, practice your skills related to this level. Those are my recommendation. I hope it will help you to kick off your experience. Good luck and let's get started. 5. Section 02 - Data Visualization with Matplotlib and Seaborn: high end will come back in this section were planning toe kick off our knowledge about to data visualization libraries they might put Liban Siebold. I assume you may ask why we should focus on those two specific libraries instead off using other options. And this is a good question. So, first off, all mudflow, clip and see bone are very a popular, well established A libraries with a large number off data scientists that are using them. The meaning off such large community is that you will be able to find endless examples while searching the Web. The second thing to consider is that those two libraries are well documented. There is an official websites as a reference when would like to check something like the list of available arguments for a specific function we would like to use or how to create a particular child and look for some templates. Such the communication is relevant and important because we don't want to spend endless hours while searching on how to perform something. The issue is that those two libraries are complementing each other. Mud brutally provides a very flexible infrastructure in tow. Control any aspect off a child who would like to create or the other and see bone is running at top mud put lib, and it is providing a higher eat off face for creating charts. A quick options to visualize something without getting toe the load eaters. Now the most interesting part to remember here is that we can always combine them. Kate. A quick chart with Seaborn and deflated wicked wit. Mud problem. Let's start with mud lutely. 6. Matplotlib – Overview: the most popular library that pop up while performing some Google search about KATING charts in Piketon is the famous mud pro Clipper library. It is considered to be the cornerstone off data visualization beytin. Many other more advance libraries are using the mud put lib infrastructure. It's not the answer or the best tool for any situation, but it's a strong starting point, and it will help us to cover many common visualization use cases. So with this short introduction, let's get down to business. The first step, it will be toe import, a specific model in a mud put labels. The name off the library and the name of the model is called Pipe Float and provided with some short name meaning plt. Next, I will define three simple numerical at least. Okay, X one y one in white. You to present some data. Finally, I will use the function that is part off the Peel D model called a float. Okay, to set up a line chart with X one in the X axis and why one on the Y axis and then present the graph using another function called Show. This one. You go. We just created a well first simple child in March. Quickly, each child has many attributes. We can ensure the just to get more nice looking information. For example, I would like to read labels toe the axes and also a title name. In that case, I can use the X label while herbal entitle function. Okay, X label y label entitled We have it My simple line chart and the label for here and another label over here. Now, what if I would like to present two lines on the same child with the label for each line that will be present also is a legend. In that case, there are a couple of things I I should perform First of full. I should use the plot function twice. Okay. While providing a different white data points. Okay. You remember we created why one and white you and use a keyword called label when calling that functions. Okay, this is this is the label barometers, and that's the argument Line one. And for the second boat, it's going to be a lying to This. Information will be used to display the legend. Let's see that together. So now I have two lines, and this is the legend, lying one and lying to. As you can see, the limits on the X axis and the Y X is a created automatically, just based on the values. But in case we want to manually just the X and Y X's limits, that is a function called access. I will use it to adjust at the X axis limit toe Toby between zero and 10 and wax Toby zero toe 13. So let's see the together. This is the A function guilty dot access and first to a pairs is their ex elements, and the 2nd 1 is the Why limits. Let's see the result. Here you go. Now this is between 0 to 10 and this one is between 0 to 13. One of the most powerful option in model clip is the flexibility to control and customize the line for marching. Okay, this is this line such as the you know you could play with the Coehlo Marquel line style using additional keywords argument. So let's see an example that I'm completely changing the look and feel on those two lines, using all those argument like a Coehlo Marquel line style line with Markle size. Okay, you go looking completely different with this. All a new customization. Now, which key would argument we can use to adjust the line properties? Well, we can open them. Absolutely. A website. So let's do that together. Will switch to them output lib site and search for an object that is called line to the two dimensional and click on the first result. Okay, over here, I'm getting a list off. Available A pop. It is related. Toe that class as an example, I will go to the market option, click on it and see all markets. Okay for a full description off all possible argument. And this is all kind off markers that I can use. Okay, this is the argument. And that will be the the visualization result. When using this option going back to the main page. Another nice argument is called Alfa, this one that can be used to soft in Kahlo's by adjusting the transparency off each plots. Let's add to this Alfa keywords to the second plot in our code. Okay, so I'm going here and heading off Ah, equal to a 0.5. Okay, maybe I will use a lower number. As you can see this line, the second line, the Green Line, is more transparent than them. Red line. Okay, which can help you to play around if you like to emphasize some line. As you can see, there are many options we can play around, which makes moderately very powerful option for making many customization on the other than it is more work for us. More lines off court to write and adjust its a balance between flexibility and complexity. In addition to performing those manual customization. Using the key keywords argument, we can use some pre defined style in March. Blue Kleeb. Okay, To see all available style, we can use the following line. Print plt dot style not available. Okay, and I will get a list off all available style. Every stringer represent some pretty find a style and then to use one off them. We should use the use function and put the name of the required style as an argument so I will could be the same a cord. But this time I will add that line penalty dot style don't use, and this is one off the available style from that list. Here you go. That's the chart. And the background is a little bit. Is changes in gray. And now you have also agreed and they presented ovary. And of course, I can play around and use different styles. Was needed. Okay. It was just a quick introduction. There are many more things were going to live in the following lectures. I would like to explain a few fundamental concepts about Mark bluntly. 7. Matplotlib – Figures, Axes: any child in mud. Brooke Lib is constructed from the falling two main components. Figure and access. Starting from the 1st 1 A figure is like a canvas that everything is drawn on it. It can be a window in an application or a Jupiter widgets. It's the top level component in a child. It is an object that has all kinds off attributes like a title size big on Coehlo legend Coble, etcetera. A single figure may contain one or more plots that are called axes. The axes represent an individual float. Inside the figure. It is the area in which the data is presented. It is also an object that has all kinds off attributes like a Title X label. Why label etcetera? A single figure can contain many axes objects, but an axis. Objects can only be in one figure. Let's use a few examples to understand this important concept. Here we have a single figure representing the complete window and inside a single axis object for presenting a single gulf. If we divide the figure, converse in tow, logical hose and columns, then we have one ho and one column. In the next example, I have a single figure and to access that are vertically stacked inside a single figure object. We can have multiple access objects that are basically different graphs. In such use cases. The figure Converse is divided into two holes and one column. This is why they are vertically stucked moving. Next, we have a single figure and also to access. But this time there are result early stacked. The figure Converse is divided into one ho and two columns, and in the last example, I have a single figure and four access. The figure Converse is divided into two homes in two columns. Now why is it important? OK, why I'm providing all those examples. Pi Tone is an object oriented language, and in most cases we are doing things while using objects. But when looking back on our previous code used to generate the charts, we didn't Kate any figure, object or access objects and then used method related to such objects. Those objects were created automatically. So what's going on here? The answer is related to the ways we can interact in use mud, probably, and this is the main topic in the next lecture 8. Matplotlib – The OO and Pyplot Interfaces: my pro. Kleeb provides two interfaces. Toe access Childs. The 1st 1 is called the Pipe Load Interface, and the 2nd 1 is called Object. Over Here, The detail facing short Oh, it's a dual approach and probably can be a little bit confusing for beginners. I remember that it confused me when I started to use a math problem. So to be on the safe side, I would like to present a both off them. Eso You will have a better understanding right from the beginning. I will start with the first interface. The Piper interface is the one we used in the previous lecture and considered to be very popular option. It was developed in order to imitate in fight on the way, developers allocating graphs using another language called Mud Club. And now you can understand the nature off the library. Name. Matt. Plot Lib. The city office is a way to access a moderately, we doubt using an object way into syntax. All objects are encapsulated inside the model. A pie plot will automatically create and manage the required a figure and access objects. And when we would like to present something on it on a child now because we are not working directly with objects. The pie plot model is the one responsible for handling the current state's Off our A child . We can write a sequence of flies that will keep changing the current setting off a child. Therefore, it's a state based interface because pipe load is handling the updated states. As a quick reminder, we imported the pipe load model and provided at the short Ali SP lt. Then any operation to create floats and configure all kinds off attributes was done directly on the plt model. Okay, like plt dot floor guilty does access plt dot exe label etcetera. This is option number one. The first interface. I will move to the next interface. That is going to be more 11th for us. Okay, this is the object oriented interface. The mindset in Piketon is the walk with objects. Okay, we know that. And in that perspective, I'm absolutely provides a second option, an object oriented interface. And using this interface, we will explicitly create figures and access objects and called method on those objects. Okay. The main benefit is that it will give us more flexibility to customize our charts compared to the first interface. And overall it's more aligned to the pie tone. A mindset. Let's see that in action. I will define three numerical list. OK is data and use some block off, cored present child the same block. And I just added festival those two line We still need to import the pie plot and model from the math lutely library even when using the second interface. Okay, they object oriented interface. And then the first step would be to use a function called supple haute. Okay, CPI lt don't support, which is used to create a single figure object and one or more access objects. Now the references to access those objects will be saved in dedicated, valuable. Okay, this is the figure valuable and then a x viable. Okay. From this point, all our customization will be done on the axis objects. You see, eggs don't bloat eggs. Dots it X label into one. Okay. The access classes the main entrance point while walking with the object oriented interface . The syntax off customizing the chart is similar. We just need to access the plot method formed the axis object because the old syntax is similar Let's run it and I'm getting the same result. Let's see another example with multiple access graphs. In that case, I need to supply the sub uploads. Okay, functions. The number off falls in the number off columns, so for creating two vertically stacked applauds, I need to lows and one column. Next, I will access the float a function for each access object. You see. I have X one and a ekes to those two different objects, and I'm accessing the plot function related to X one and, um accessing the function plot related to X tooth. Let's run it. Here you go. It's a vertically a stacked a subplots now for creating or result early. Stacked supports our will create one hole and to columns he ago. This is the guilty dots uploads 102 columns at the end off such a block off court. It is recommended to use the plt dot a show and here we go horizontally stacked supplements . Okay, It was a quick overview off the two interface options when searching online will see developer using both of them. So it is good to understand how to read a code that was written in any option. However, it is strongly recommended to stick with one option when writing code. Otherwise it will confuse you. You can always decide which interfaces better for you. My recommendation will be to use the object oriented interface and I will tell you why. First of all, it is more fight on oriented mindset, meaning walking with objects. Secondly, it is more flexible because we are working directly with the object method. Okay, there is no high level interface between us, and the tell reason is that it will help us later own when walking with another library called See Bone. So my examples from this point and moving forward will be using the object oriented interface. 9. Matplotlib – APIs Reference Review: the last thing I would like to review about Matt broke Libya's their own line. The communication off that library. It is extremely important that you will be comfortable toe open and search for things on that website. So where? Let's have a quick overview. The site name is a matt Barkley dot ogg, the few main section here in this website like installation instruction, the communication examples toe toil. I will select the communication and inside click on a P I Off of you. As we can see here, there are two main usage Parton's and each one of them is based on different A P I. The 1st 1 is the pipe load, a p I, and the 2nd 1 is the object oriented AP I. Let's select the pipe, load a p I and then click on the mud put lip dot pie plot. A class on this page will see the full updated least off functions that are available using this interface. It's a long list. In great reverence for the A. P. I most probably will encounter a situation that we will not remember the exact function name. What function should be used to perform something or how to use a specific function. So this is the place toe answer. All those questions, as I said, are planning to use the second AP I. So I will go back to the A P I overview and select the second option, which is the object oriented. A p I. I will read this important remark for a second at its calls. Matt Look, lib is object oriented. We recommend directly walking with the objects if you need more control and customization off your plots, in many cases you will create a figure and one or more access using pi plot dot subplots. And from then on Onley walk on these objects. Okay, this is the approach is the way that we're going to use the object oriented a p I. All the relevant methods to create and customize the child will be performed on the mud Gottlieb axis objects, so let's click on it for an overview off the Axis class and available a plotting functions . At the top of the page, there is a useful table off content. For example, all plotting functions are divided into categories are we click on the basic category and A . We will find a list off basic common plotting functions. For example, the scatter functions bar function by function, stuck, float, etcetera. If I will school down below, there are many more plotting functions under different categories. As an example, I can select the plot function to review the genetic coal signature structure off that particular function. I will score below toe the list off barometers for this specific function. We should provide the coordinates off the points that are creating the line using the X and why arguments There is an optional argument. Court F empty is a way for defining basic formatting like a Coehlo market and line style. This is an example off using this kind of notation float ex and why using a blue circle mark ALS. I don't use this kind of flotation syntax because it can be confusing. A better approach will be to use the keyword arguments related toe the line to the class. The court would be much more readable. For example, this line okay, you can see all kind of keywords arguments like Coehlo Mulcaire Lines style line with a mark inside. It's much more clear every customization in that case is clearly defined by a key world. We already saw that before. To get the full list off options to customize the line, we should click on their line to the page. It is a full list off all available properties. Let's go back to the previous page. I will score below and over there there is an organized table off all optional parameters. OK, in general, this is the way to review all options related to a specific function. I'm going back to the main Axis glass page. We're going to cover a variety of charts doing the training, but in any case, you can review additional information in this online. Referenced the communication. Okay, that's about the introduction off the Mudflow Kleeb Library. Let's move on to the next library would like to using this training 10. Seaborn – Overview: we saw that the power off mud put Leeb is related to the extensive customization we can perform on charts. We just started to see those options, and of course, we'll cover much more later on this. Flexibility has also a major disadvantage. It will take more lines off code more, walk more time to great, nice looking charts, doing an exploratory data analysis process. We may want to perform things more quickly with minimal investment, and therefore we can consider using other options other libraries. So our next data visualization library is the sea bone library. See Bone is a fight on data visualization library that is based on moderately. It's like a simplified high level interface, tomar put lib. It provides very nice styles to make statistical graphs look more attractive, and also it has tight instigation to the pens data structure in simple worlds. It makes it easy to get to know our data doing the that explosion and analysis process. So with this showed and high level introduction, let's get down to business and start to learn how to use see bone. The first step will be toe import, the sea bone library. Usually we will also import the math lutely Bay library, which will help us to perform additional cost organization if needed. Now, in order to present some nice graphs, I need a data set. I can always use pandas and load something from a CSE file, and we will do it later on. Another option would be to use the Seaborne built in data sets. See, Bone has a small number of data sets a sparked off an online repository that we can use for our training. There is a function it called s Innis a dot Get data set in names like that. I can use it to see the list off all available data set in that online repository. So every line here he present separated Dichter said that we can use I can load a such building data set using another function cold load under school data set. I'm going to load a famous data set called the Irish Data Set This one this function, we load the data set and then return a date of frame objects. Let review the data frame of a structure using the head metal. I will also count than amount off a close. Okay, the this amount off rose. So this data set consist 150 samples from each of the three species off the Irish flower. There are four features that will measure for each simple. Okay, those four features the length and wits off the cell Fels, and also the length and which off the petals and in centimeters based on the combination off this four features is possible to distinguish their species from each other. The data set is labeled okay. Meaning the type off species is indicated in the last column off each line. Okay, we have a data. Sit. Now let's move on to the next step. Meaning how to display charts using Seaborn while utilizing this data sets. 11. Seaborn – Figure and Axes-level Functions: doing the introduction about math lutely. We talked about two main components off a chart, meaning a figure object and access objects. A figure Objects is like the converse that can be used to draw a single plot or multiple plots. Each plot is represented by one access object. We also mentioned that See Bone is using their mud politically be infrastructure, meaning the same two types off objects figure and access. In that context, Seaborn has two main types. Off functions. Figure level functions and access level functions. Those two groups are representing two different ways to create charts, starting with the figure level functions. A figure level function is a function that controls the figure that is supposed to display graphs inside. When we called such function, the function will create and control the figure. This is why it is called a figure level function. For example, the following function in cibona figure level function. Cut the float for categorical graph and help float for relational a graphs, for example. In this line, they cut float. Okay, this one. The couple function is a figure level function. Cut plot is a generic function toe display, a variety off categorical graphs. There is an argument called kind, this one and we use it to provide the name of the required a graph. In this case, the Gulf is called swarm. Okay, lets running for a second. I will provide the X and Y columns label from the iris Data set will get this result. As you can see, I didn't provide additional parameters like the cola off each flower category or the Axis labels that we can see over here, here and here. Things that we used to do in mud clip This kind off translation was done automatically by sea bone. And this is the power off. See bone automating some off the visualization setting. If I want to use a different, kind off categorical graph, all I need to do is to change the kind argument. OK, I will change it to books. Over here. This is the box plot chart which we will discuss a later on Now. The disadvantage when using a figure level function is related to the fact that it's more complicated. Toe combine customization with Matt. Absolutely. This function will return a specific Seaborn object to perform customization at the figure level so I can't easily use mudflow trip customization functions and other disadvantage is that I cannot combine a mud probably plot in a sea bone plot in the same figure when using a figure level function in simple. Anyway, I'm not recommending to use figure level function and see bone. There is a better approach. We can use access level functions that are walking at the access object level. Those function can be used to draw into a single month Gottlieb access without affecting the rest of the figure. So access level functions can be combined into a more complex part booklet figure. And we can perform customization while using mark block lib. In simple words, access level functions in Cibona much better Interrogated A with mudflow plea and Belfour, we'll use this approach. Let's see An example is the first step. We will use mud proc leap to create one figure and to access object x one and x two. I'm using the fig size barometers to control the size off the figure. Okay, this is a regular court in pie. Tom then will use an access level function called swamp a plot This one in all the to draw a swarm A graph in a specific access. Let's run it. Great. I like a figure level function. In that case, the access level function Seaborne must get as input the access object we created, using moderately meaning this one. Okay, a x one. We created aches one over here. And then you were using that as an input to the sea bone, A swarm plot function. And that's why I'm getting the result over here. Okay, this is the integration point between those two libraries. Now, to complete the example, I will draw additional scatter float on the second access, using much bloated function. So that will be the full look off, cored first of all, creating those to the figure and the to access object. Then, in the SNS, one plot used the X equal to x one. And then I'm using a coat which is related to march to keep a x two dot scatter x to another functions to play around with the second plot. Here you go. Now we have to floats on the same figure. One is based on, say, bone on the left side and one is based on. But look, leap and we can customize both off them using mud prickly. Let's see such customization in the next lecture 12. Seaborn - Chart Customization: as a reminder. See, Bone is a high level interface to march proc lib, and we can bring in March probably be in order to perform some additional tweaking In a chart circulated in Sable, for example, let's say, would like to wear the title toe the first plot we created in the previous lecture. Okay, that those that this is the figure we created two different child. This one was created by Seaboard and I would like to the title over here. Okay, let's see there they edit the new line. So I'm adding this line x one dot said titles and it will ed they needed title Graf won. Okay, that's the customization I'm talking about. I created and this one using see bone by providing the X one as input parameter toe the swamp float a spot off the CBO nim a model, but I can use mud brutally functions in order to customize that. By the way, the SNS swamp float. Also, we turn an access object that I can combine the lines using the dot notation so it can be something like that. Okay, right. Sns dot swamp plot and all in all the parameters and then with the dot notation because it this a May called Will Return and access object. I can use the dot notation and access that is an object and use, said Title Cough one and I will get off course the same result now to switch all reset Seaborn to a default. It's setting. We can call the set function without any argument. Essence dot set. That's it. Now there are five precepts Siebel teams at the figure level that are called dark Read white. Very dark white antics. The different team is called a dark greed. We can use the set underscore started to select the required style. So I will do that over here. It's just added this new line sns dot set style. And I selected some off the style duck. Agreed. Can we get some? A result based on the style that I selected? Let's use another team called ticks, which will cause it takes to appear on the sides off the plot. So let's meet Face that again. And this time use takes. Okay, here you go. You see the ticks over here? OK, I didn't so that on this one got all kind off options that we can use. Now let's use another graph type called scatter. It is used to present the decision off to valuables using points where each point represents a new observation. And they set, of course, will dive into that much in much more details later on. I just want to show you some of the basic syntax in Cebull, so festival I will present that Here you go. Now, even though the points are floated into Dimension X and Y, another dimension can be added by following the points. According Toa a tailed valuable in see bone this is referred. It was using the who semantic because the color off the points gained some meaning. I will add this species categorical, valuable as a tell a dimension. Okay, lets see that it's interesting to see the result. I'm heading the species using this it parameters and this will call Oh, the points based on that categorical viable. Okay, here we go. Okay, we'll talk about it in much more details later on. I just wanted to show you the simple but very powerful syntax in Cibona. Okay, it's edl automatically. The legend over here decided the Coehlo Beirich air species and so one 13. Seaborn – API Reference Review: It's part of the high level introduction about See Bone. It is important to quickly review the website and the A p I reference page, the same thing that we have done toe the Mark Pro clip. This is the link to the website https See bone dot people data dot all other. The gallery section will find all kinds off examples when clicking on an example like the ball bloat. We'll get the fightin source code used to generate a that chart. It's a quick way to see the template with the relevant syntax. For more deep dive, explanations will go to the tutorial section. Overly will be able to find useful information on a few categories off plotting functions, for example, floating with categorical data. And inside this page will get a high level explanation about the options to create categorical charts and many more examples. A related toe that category just school down below to see all kind off. Nice examples. Now the most relevant section I wanted to show you is the A P I reference over here will get the summary off all available functions in C bone and a quick description off each function For example, under the first category, we talked about their help float, which is a figure level function on the other, and the scatter, float and lying float our access level functions. The next group of function is the categorical plots. Get the cut float is a figure level function and the rest of the functions below our access level functions. If I select the swamp float, I will get detailed information about the syntax off, using dysfunction, all relevant parameters and the default values. For example, this size off the point is, by default equal to five. Let's go back to the main AP I reference page Schooling below. There are many more groups off functions like distribution floats, immigration floats, metrics, plots controlling the style, Coehlo pallets and some utility functions. For example, we also so the load underscored data set function that is used to load the built in data set from the sea bone online repository. Let's elect it. The load underscore, Data said, is a function from the Seaborne model. The main input parameters is, of course, the name of the data set would like to load when we click on this link, which is the get up online repository. We can use it to see the full list. Hey, it is the famous tectonic Docs V fire we used in level two and the Irish Dot says we we used until this point and additional data sets we're going to use going back to the load. Underscore a data set function from the sea bone and online communication. We can see below that it will take them a Penders data frame object. Okay. It was some kind off high level overview off the sea bone website that can be used as a great deference when you would like to check something Lenin the new function and review some examples. 14. A little bit about NumPy: the last library I would like to quickly review is called Numb Pie Lump. I is not a date of his organization library like mud brutally ballsy bone. But I want to use it doing the training to generate data points for some use cases. It's a minor topic at this level, and still, I don't want you to be surprised when I will import and use that library. At this stage, I'm going to cover a very small part off number. So what is numb by number is a well known a library for a bite on adding support for large , multi dimensional raise, coupled with a large collection of high level mathematical functions that are optimized to operate on these raise. The underlying infrastructure is an optimized, a low level cold, usually based on C for performing no medical computing on a race. Many data science libraries are using Mumbai infrastructure like a pan this and mud pro clip that we used until this point think about a simple, one dimensional array off one million items, and you would like to normalize all values in that array by dividing each item with the number off 100. If we perform such numerical calculation with the basic a bite on the syntax using a for loop. It will be a slow process on the other end. Numb pie is a much better and more optimized way to perform such a fast numerical computing on vectors. Meaningless off numeric numbers. Let's move to the practical side number by a ray classes called nd array. It's an n dimensional array. Data structure can be one dimensional, two dimensional, three dimensional etcetera. All items in the array must be the same type in, um by dimension called access that is a dedicated function to create, such in the array object in a specific shape. So let's see that together. First off, all I'm importing the NUM, Pires and P and them are accessing some function called a rage to create a 20 number and put that in a metrics with a specific shape four or five. Let's bring that result. I can access a specific item or with a group of items inside the Saray, just a. We need to remember that the index's starting from zero and not a one, so this one will provide me like one dimensional a line Okay, this one. And if I would like a specific cell inside, Okay, like this one. Here you go. Now there are a couple off main attributes off the India Rey object, for example, to get the type of element in there a week, we can use the D type attribute. A goat D type creates an integer. All items are integer number. There is another one is what that is called and the other at a tribute. This will be the number off access or they mentioned off that way. Okay, it's a two dimension. A. Let's see also the shape off the object. 415 as we created the shape Attributes is like dependence. Take the frame attributes. It's providing the dimension off their way the size of their A in each dimension. Now, to get the total number off elements off the right, we can use the size attribute. I have got 20 items. Let's talk about more options to create a new away. First option was the A function I can provide the list of items like that NP daughter A and the list off a Thames. That's it. Very simple. It can be also to dimension that now to create a sequence off numbers. Now pie provide the A range function, which is Anna logs to the fight on building rage, but it will be 10 and a providing the two numbers and the required step case of the syntax will be something like that sequel toe mp dot a range between one in 20 and this step will be to great. So we believe 1357 etcetera. And now the better option will be to use the line space function that will get the number off elements that we want between two value. Okay, the step will be calculated automatically, and this is a function that we're using in the training to generate data. So that's the name of the function I would like between a zero toe to 20. Okay. And I will get a 100 numbers between those two numbers, and the step will be calculated automatically. The result is one dimensional way, but I can easily also change that. Let's say I would like this specific shape Here you go, using the reshape option Now, in case we need random numbers, we can use the random adult rand function that will random values in some given a shape. Let's see how I'm using that. So n p dot random dot rand phone fall and I will get those number one. The numbers can be also different, completely different shape. Here you go. Okay. This is all what we need about numb by at this point for the rest of the training. In this level, off course, we may lend a new things about lump I in future levels, but therefore, this point this is more than enough. 15. The Right Chart for the Right Job: We had a quick, high level review of the two main data visualization libraries were planning to use, meaning there must look. Liban. See bone. Now it's time to talk about our visualization tool books. Imagine that each child is like a tool, and all tools are sitting in some toolbox. How can we select the right chart? Okay, the right tool for the right job it's not so simple is human. I assume in that context there is a famous sentence saying, If all you have is a hammer, everything looks like a nail. There are a variety off charts that can be used to visualize a data set. Some of them are simple, and some of them are very complex, like a heavy duty Hummer. In many cases, we don't necessarily need to select the most complex child in order to visualize something . It's a careful balance between sophistication and simplicity. As a general rule of thumb, it's important to be reasonable with the amount of information we are trying to display on a single child. Think about how you can convey the message as simple as possible, taking into account that not all people looking on your analysis. We live a statistical background or the patient to spend a long time on a single child to understand the relevant information. The patterns identified in a child should be clear and emphasized as much as possible. Secondly, it is super important. Toby A. Well off our visualization options, so we can better select the most 11 chart, the most suitable and 11 tool for the job to be done. Therefore, I divided all child types that I would like to present in a few main categories. For easy navigation. We have ranking and proportion. Chart is a dedicate section, trained and distribution charts and correlation charts. Every child will ever dedicated a lecture inside each category inside inside each section. After reviewing all those Cartago is, you will have a better understanding on how to select the right one, selecting the right tool from the toolbox and, of course, how to create and customize it to your specific needs. 16. Section 03 - Ranking and Proportion Charts: hi and welcome back. The first group off charts that I would like to present is a combination off two categories . Wrecking and proportion. Charles Starting with ranking category. We're going to learn how to create and use the famous bouche out with few interesting variations, like a group, a bunch out lolly pop chart and the stock bottle child. Each off them is optimized for a different use case. Then we'll move to the proportion category, and we will see how to create a pie chart for a small number off items related to a categorical, valuable and also the tree map chart for complex data set with many items also at the end of the section, we're going to talk about optimizing Coehlo selection when creating charts, something that will be useful for almost all types off a charts. In addition, at the end of this section you have Osama. We exercise in a solution for that exercise. Okay, this is our high level Rodman for this section. Good luck and let's get started 17. Bar Chart: our first child is the famous bouche out. The bar chart is a very efficient and simple way to show the relationship between a numeric value any categorical, valuable. Each possible value of the categorical valuable is represented as a rectangle, a ball and the size off the bar. Okay, the high off the ball will be proportional to the numeric value it represents. Let's get a simple child as a simple step. I will import the to libraries and then load the Titanic data set directly from the sea bone repository. So those are the two import loading the Titanic data set, and let's display the structure off this data friend. So the structure is quite organized. Some off the columns are a little bit different from the titanic dot C every file we used in level two, but the core columns are the same. Okay, each line represents some a passenger with all kind of information about that passenger. If this punches a passenger, survived or not, the class off ticketea used. If it's ah, what is the gender off the pills in the age and what is the ticket price? What was the airport off? Embarking to the to the sheep and so one. Now using the seasonal a metal change. With this some a toad, I can check that there are no missing value for the column survived and sex that I would like to use, I would like to use. The six column is a categorical, valuable and the some off survived as a numerical value in my by bouche out. So by Annick, this two functions I don't see that I have a problem on those columns. Survived and sex. There are no missing values, so I can directly use them. Now. I would like to create a symbol about shot using the categorical valuable sex Okay, which has only two possible values. Main female and the proportion of survival. Categorical group is the first step a week eight, a new data frame filtering out all passengers that didn't survive. So it will be this line survived underscored the effort You the different equal to Titanic and I'm filtering only passenger that they survived is equal to true. That's a new data firm. There were great a least off unique values under the sex column installed the result in a least cold in names and I'm expecting to get to unique value, female and male. Now this names list will be used and by the X axis off the bar chart. And also I need to calculate the number off survival per each unique group and store it in a least call values to abused as the Y X is off the bunch. So it's it's some kind off recalculation before presenting the bar chart. So I'm provided calculating the values survived in under school. Do you have sex dot Value counts are many. Megan survive are many female, and you can see the result. Okay, Now I can create a simple bouche out using the about and metal that we have in there much bluntly. Okay, in this bunch out, the categorical valuable is sex. Okay, You consider to unique value that category of invaluable male and female, and then the amount off survived passengers. Each categorical value is there ball size? Okay. The height off the ball, which represent some measured no medical value, and other quick options to calculate the number off survivors Gender Group would be by using the goodbye function in Panda's. It is something we eso in level two. All we need to do it will be to group the data set, using the relevant categorical a valuable and then apply a summary aggregation functions. So the syntax will be the following. A survived equal to titanic dot go by sex. A filter by this column survived, which is a bullion value zero in one and some. The result. Okay, that's it. I mean, instead of those two lines, let's combine that inside. So the actually, the result off this data frame has labels, columns, labels and whole labels, and I'm using that is it is an input to the bowel function. So the survive dot index, which is the role labels, will be the X axis input and the survived don't values OK, which is another at attribute in a data flame will be used and I'm getting the same result . I just used a different color in case the labels off the categorical valuables are too long or there are too many values for categorical valuable. Okay, here we have only female and male, but we can have many, many categorical variable. In that case, we could switch between the access and display the bars or result early instead of vertically. OK, the function will be a about h o cable. H Okay. We just need to not to forget to switch between the X and Y labels. OK, and we'll get the result. I know now the Basel presented it recently. Let's load another day to sit called tips, and this time you see bone to create about job. I'm learning this one. Let's see the structure. Each line here represent the total bill and the amount off tip provided by a goof off people eating in a restaurant on a specific day. The Sex column is the person gentle paying for the meal? And if it least someone from the group is a smoker, will get yes and, of course, the day in time. And when that Cupid that lunch in the size off that group now for creating a bunch out off the average total bill each day, I can use the ball plot function while providing the day as the ex access and the total bill is there. Why access? We should keep in mind that, unlike the about function in mud, put lib that is using a summary function. The default aggregation function in Sieben will be average looking in our case, it will calculate the average total bill valuable each day. Let's see that together. So first of all, I'm using mud clipped. Okay, the figure and a access object. Using the subplot function, I'm defining some style. And then I'm using the about float function, providing as the Eaks for the X axis the day and for the y axis. The total bill and I'm also providing the tips is the data said it would like to use and a X equal toe a X, which is the a X object advocated right over here. As you can see, the X axis is the day column, which is a categorical valuable with only four days each day, the average total. A bill was calculated automatically by see bone. In addition, the peaks and why labels? They were translated automatically by sable from the data set. Labels and the black lines represent the confidence interval Peerage Bob as another example to draw a set off result. All bus we should set the numerical valuable on the X axis and the categorical valuable is the Y axis. This time I will use the time a valuable. Okay, so let's see another example. So this is the time, Okay? Valuable, which is also categorical, valuable. We have only toe options, lunch and dinner. And I'm using the tip as the numerical value value to present it on the X axis. We can also adjust the Kahlo's using the palate parameter. Okay, They're all kind of available options. Get this one just to play with the look and feel. Now, what about the situation off comparing more than one categorical available with the same numerical valuable for that, we can consider using something that is called Gupte a bar chart. And that's the topic for the next lecture. 18. Grouped Bar Chart: a Gupte bouche Out is a type of fit about graph that is used to represent and compare different categories off two or more categorical groups using the same numerical valuable. Okay, let's review the structure of the Titanic data set again to understand a dis concept. Now the sex A column is a categorical valuable, and we used it to compare the number off survival spent each gender, female and male. But let's say would like to add another categorical valuable for the same about job, for example, that they would like to use also the Class A column, which is also a categorical valuable. So I would like to combine sex and class is too categorical, valuable and use that as a numerical valuable. The survived column. So in Seaborne, creating a group Belcher for too categorical, valuable is quite a simple let's see the same tax. It's going something like that. Okay, All we need to do while using the about plot function in See Bone is to use another power matter, which is called you and provide the name off the feature off column, which in our case, it's called Class. This is the result first awful. We have the sex as the baseline for the x axis. They survived as the why it axes and as a subgroup, inside is the class. So the class as three options fail Second and Ted So for the mailer will see all those groups and also for the female. But when I'm looking on the Y X is OK, Something is not making sense. The survived is not the samarie. Okay, you can see the values a little bit strange. The survived is not a summary, as I would like it to be. Okay, it is. The default is a mean ever each calculation. We can change it by using another parameter, which is called the Estimator. Okay, so all we need to do is to add this barometer estimated and set it to some. Now it making sense. The White column is some summary. Okay. So, for example, looking on the male group and the first class. Okay, this is the amount off survivor, almost a for the 3 44 Something like that. Let's let's make sure that we understand out to read that shot. So the first categorical valuable is the sex, and this is the baseline off the X axis. The second catalogue available is the class okay is like sub grouping for male. We have first taken terrible and also for female were first account. That lets which and use the class as the base line. So it will be a something like that. Now I'm using the classes. The is the baseline for the X axis, and the sex is that is the second categorical valuable, and I will get a little bit different. Toe can feel now. The class is the baseline 1st 2nd Ted and I'm expecting to ball child, put each one of them, you know, male and female. Let's move to a mud blue cliff Kating A good bar chart in but broadly with a little bit more tricky, but in the same time provide you more flexibility, so we'll let you decide what option you would like to use. You may see some examples for Mother a people that you pro using matter trip to create a group table shot. So as a first step, let's create a data frame from a dictionary, something that we have done several times in Level two reporting, Penders says. Speedy and defining that dictionary with those three columns product revenue, and then use that dictionary to create some data frame. Let's present the data frame we have released off it for product. Apple banana beer and juice and rich product the revenue for a specific deal. Okay, the product is a categorical available, and also the year is a categorical verbal, and revenue is a numerical available. Let's I would like to present in a bunch out the total revenue off all products for all years. Okay, The first category is the year, and the second category will be the product. Okay. How exactly? I'm doing that in Matt brought lib. I m okay. So the year should be the common baseline for the X axis. As a first step, I need a normalized a off X axis that will start from zero and will be in the size off unique values in the column. Okay, so I'm checking how much unique value I have in the column and then using a function that is called Len to get their needed size. Okay. The sides should be three. Okay, because they are three years in the data frame for that column. Next will be too great. A simple away off that size will use the function called a range from the lump I model. Okay, Mumbai dot a range with that size excise. Here we go. Simple One dimensional array 012 Now, to be able to present a few bar charts next to each other, we use some trick. Okay, The first step will sit a new barometer, which is called bow it to some size. Next will be to create three about charts for which a poor duct on the same access object, actually four bouche out and the trick will be to shift all X values with the same size off the ball with it. Okay, so let's see that for a second. Maybe it's scary for the first time, but it's not so complicated if you understand the logic. So let's see what we have you. So this line gave the figure and access. Now there are four lines. A for creating the four product thesis one this 14 lines, each one with different product apple banana 1,000,000,000 juice. Now the tree kill is that each time I'm using different X m am is an input for So the 1st 1 will be the simple X ray that I created previously. But for the 2nd 1 I will add it for the geeks, the ball with and for the 3rd 1 it will be about with multiplied by two and then multiplied by three and so on. Okay, it's a simple logic, and this will be the end result. Here we have the deal categorical valuable as the baseline with those three groups. The second coverage available, which is the product, is represented by those four ball Sabah's that are Kahlo coded. You can see the legend right over here. For example, the apple revenue, which is the red one, is going every you on the other end at the B revenue, which is in green, is dropping on the last year on the other end. It is more difficult to observe the total within specific primary categoric level. Can we quickly understand what is the total revenue in year 2000? Okay, for example, this one okay, no, but we can easily see that the juice a product in that it was generating more revenue compared to other products. Now, just to compare it with see bone. I can create the same chart in one line off court, which is definitely a winning option if you're looking for a very fast data preview. Okay, so I'm just providing a 60 year. Why revenue. And this is the Ted one product and I will get the same result. Okay, Now, in the next lecture will see a nice, able just customization that can only be done in mud. Put lip. 19. Lollipop Chart: Another option to consider that is almost identical to about chart is called a Lollipop Chart, where the box is basically replaced with a line in a market at the top of the line. Now you can guess White was called lollypop shot again. Like about out, it will show the relationship between a numerical value in a categorical, viable as the first step. I will define a dictionary, object with data about several product and their respective sold units, and create a data friend from the dictionary. Object. Okay, As always, we will create a figure and one or more access objects using their support function. Now I'm going to use a new function called Steam that is used to draw vertical lines at each X location from the baseline toe white and place a marker at the top. In our case, the data frame a product column will be the X argument, and the Sword Units column will be the Y arguments. Dysfunction will be 10 pre a objects. Okay, let's see the the same tax. Okay, So first awful. I'm using fig and experience the dot subplots to create figure and access, and then I'm okay. I'm using this function stum and using the product. The units this function will return a three objects Mulkey line Steam line in Baseline. It's run it. This is nice, but I would like to improve that A little bit rich object. I can use this, that function and, for example, in the marking line object. I will use Asterix okay instead off a simple point and also defined the size With and Kahlo . In addition, I will define a blue polo for the steam line and said The baseline, which is now in red Toby invisible. So I can dio all discussed, um, ization quite easily. So I have the market lines timeline baseline and that I'm using that marking lying dot set and all kind of options to adjust that steam line don't set, and I'm well adjusted. Toe blue polo and the blaze baseline. Don't set Toby invisible. Let's see the result. Okay, it's looking much better. As you can see in this example. We have several products and the number off units sold each product. It's a It's a nice visualization option. Another thing to consider is about the ordering or sorting the vertical lines based on their values this way, the child will be much more insightful. Okay, Anything more information off course it. Sometimes the order is set by the features. Okay, like the day off the week and that you cannot play around. But I don't have such restriction here. It's just a list of product eso in this case that we saw the product based on their unit value. So I will perform the sorting directly in the data frame. So if you remember, there is from level two. There is a function called sort underscore value and I will carry the new the different called DF two. That's it. And then run that again. But this time the input will be the F two. They the frame number two and you go. OK, now it's easier to see how their products are linked according to the sold units 20. Stacked Bar Chart: the next option we can consider to uses the stacked A bunch out. The shot can only be created in the MacBook Lib, meaning we don't have a specific function in sable to create a stocked bouche out a stack bouche out stacks bars that represent different groups on top of each other, each ball on the result. All access represents a categorical valuable that will be the baseline. And also each ball on the vertical axis is divided into sub bars, using Carlos to represent the second categorical a valuable. Okay, let's see that in action. So is the first step. We create the same product, every new data frame form, a dictionary. Okay, importing the three models. And this is the dictionary product, the revenue. And I'm using that to create a frame, calculating the size off the hell because I would like to use that as my baseline for the X axis and then create using lump I This, um, A. And as you can see, this is the result of the data frame. One column is the product ill and revenue. Now, just to make the court more clear, let's create filtered a data frame for each product from the regional data frame and reset the index value pair each one of them. So I'm creating the different ones at different to that entry for which product and also it's a danger. So it will start from zero now, in order to create buzz stacked on each other. There is a parameter called bottom, which defines the baseline for each bar plot. The default value is zero, and we can change it. This is the old trick here. For example, when drawing the second ball plot, I will set the bottom Toby the upper line off the previous about plot and so wanted. This is the simple logic. Let's see that all together. So I'm creating the figure and access, and each line is for a specific product for So for the first plot, I'm I'm not playing with the bottom available because the different values zero. But for the 2nd 1 the bottom will be the revenue off product one and the bottom for the 10 product will be the revenue off product one plus the revenue of product to, and so one. Okay, If you have more product, let's see the result so the years is the categorical baseline valuable on the X axis. Every new is the Y axis, and the poor ducked. A categorical variable is presented is Abbas that are stacked on each other with different ? A close. You see help Apple Banana and Bill each year. The idea here is that we can present the relative the composition off each by Mary Barr. Based on the levels off the second categorical available, for example, we can see how a specific product in a specific it contributes to the total revenue in that year. 21. Pie Chart: Our next chart is useful proportion visualization. It is the famous by child. A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion, with some off all slices equal 100% of course, And the size off each slice, which is represented by the Elk Land, is proportional today. Quantity it represents as the first step. Let's define three slices a week labels and Kahlo's course. This is my slices, a 140 20. The labours will be Groupe Cobian and Group C and some clothes for those slices. And then I will use a specific function in math lutely called by a while using the arguments that I just dedicated. So it looks like that using some style, creating the figure and access object, setting the title and then calling this function pie with the lists as an argument that I created. This is the slice labels and also the Kahlo's. Now we can run it. As you can see, the proportion off each slice was calculated automatically. Group is taking most of their circular area, meaning more than 50% could be is the second biggest and group C is the smallest. Some nice customization will be to move a little bit. A selected slice out off the pie chart for that will use the explode parameters and defined that the last slice will be separated using their exploded rate. Meaning I would like to take poopsie and make it a little bit separated so the syntax will look something like that. The last slice is represented by the last item in disarray. I will change also the Coehlo using X a decimal value. Okay, this one. And add a legend toe the lower a right hand and also ed using this Ah, parameter also percentage for each slice. Okay, looking much better. I have a slice. Sees a little bit separated. If I would like to emphasize that I have a legend different Kahlo's that are looking nice And of course, you can select whatever colors that you would like and the percentage but each slice. Now there are a few things to keep in mind while using pie charts. So if the 1st 1 is that the number off slice should be less than 67 items, okay, otherwise it is not goingto be clear too much information. If you have more groups, then consider using another type off chart, which is called Dream Up, a charge that we see right after this lecture. Another thing to take into account is that the proportional some off slices must be 100% in most cases. When building the pie chart, we will use absolute number as I used, and mud prickly will automatically calculate the proportions between the slices. Also, always add the actual percentage to the slices is a notation because it's difficult for people to really compare different slices off given a pie chart. 22. Treemap: the next type off charts that can be used to visual proportion off some categorical variable is a tree map. A dream up displays three a structure data using nested rectangles instead, off using slices we saw in a pie chart. Each branch off the tree is a given rectangle, and then this is level one, which can be broken into smaller rectangles representing sub branches like a level two. And so one. A specific rectangle has an area proportional to a specific dimension off a value of the data. Now why using a dream up instead off a pie chart? Well, this kind off rectangular shape is very efficient. Use off chart space and therefore is useful when we have a big data set, with many groups and many subgroups each rectangle. We live a different size based on the frequency off that group and a dedicated Kahlo. So it will be easy to identify Parton's take a look on this interesting a dream up coming from Wikipedia, which represents the 2016 United States presidential election result in Florida by county with a Coehlo spectrum form Democratic blue to Republican red, it's only one level dream up, meaning each rectangle is not divided into additional groups. In a single child, we can present the proportional number off votes divided into county. There are 67 county Florida, and we can still present all of them in one child. Because of this rectangular a shape, it's easy to see the biggest top five counties. Secondly, the tangles are followed. We head and blew a spectrum based on the winning party and the shade off each. Coehlo is representing the percentage off voters. But each party, I think it's a very powerful child toe present. Complex proportional information. Let's move to the practical side for creating a tree map chart will lead to install an additional library called Square. If I just a general comment. There are many useful libraries that are not coming. It's part off the default on a condom insulation, but it is very easy to install the new library. First step will be to use a Google and typed on a conduct install and the name of the libraries. I would like to install square if I click Entel and select the first option, and over a, I will get the needed syntax that I will copy right now and then later on, based into the command line. Next step will go today on a condom navigator and select involvement. Click on the base root option and open terminal. This is the terminal and paste. It based its syntax. It would just copy from Google Kanda Install and the rest off the syntax, and it will run the installation in one. In our local involvement, that's it. After it finished, we can close the terminal, go back to the Jupiter lab to, and we can stop. So first of all, I will import this new library score if I Then I vacate data frame from addiction areas we saw earlier. It's a little bit different data frame, a simple one, a list of product and ah sold units. Berridge a product. Let's see that very simple data frame. We have eight products and pretty product, the total units being sold. Another step I need to perform will be to sort data frame. So I'm going to use the sold on the school values. Here we go. Now it's organized. I know that I have eight products, so I would like to create a list of Coehlo spillage, one off them, so we have also Kahlo's least That's it. Now I can use a function called a plot in scoring fight, but But first, let's get the doc String a description of that function, so it will be score. If I and question Mark and we get this, the signature called Full that they function. The size parameters in our case will be the unit Columns A. The label will be at the product column. We can also adjust the size of the overhaul, dream up using the gnome X and gnome. Why? So let's create our first remap. Okay, we go. I would like also some separation between the rectangles so I can set some bedding. There is a pet parameters, set it to True. And also let the change the size of the charts by setting the fix. Eyes parameters when creating the figure, so it will be more easy to see three math. So I created some separation and enlarge that a little bit more. So this is a nice way to see how the product of organized and I said, because it's a rectangle, a shape, it's much more efficient way to present very large number off a items 23. Optimizing Colors: we just learned the first groups off. Chart's ranking in proportion will cover much more interesting charts to visualize our data moving forward. At this point, it is important to talk about Carlos Kahlo's play an important all in data visualization, and therefore it is essential to understand how we can play in a just Kahlo's in in charts . There are multiple ways to customize and select a close. The simplest way will be to create a list off Kahlo's. It can be by using pre defined Coehlo names or using except decimal values, and then provide that least toe the charters an argument. It is something we already so will provide some simple example. So this is the list off Carlos as another human that I'm providing over here by Kahlo's very simple. Now the Carlos Parameters can be used to get a single call Oh, or a sequence of Kahlo's. And to be able to select better Kahlo's Okay, we can search also in Google. Coehlo Pickle Okay, just right. Coehlo pickle and use one off the provided tools. There is some automatic tools coming from Google. I can play around with the values and by reaching some specific Kahlo just copy paste the except that seem ill value and based it in a over here. Can I can do that for the rest off the items? So let's select another one like this one, and another one will be in the yellow spectrum. I just changed the Kahlo's toe. My selection. Okay, this was option number one, meaning the task off Menu Lee picking our Kahlo's, which could be useful for simple charts with a low number off Carlos. The next option will be to select a pre defined Coehlo map from the immaculately bear repository. A column up is a pretty find sequence off Carlos with some logic. I highly recommend to use this option and will use it during the training. Let's open them up quickly. The communication about Kahlo maps to see that together. So first of all, there are several classes off Kahlo maps, sequential divergence, cyclic and so want. Each category is optimized. Toby used for different use cases. For example, sequential a column maps change in lightness and saturation off Coehlo incrementally there are useful for presenting information that has some older think about the future. In a data set that represent temperature. The temperature value can be drawn between blue and red Kahlo's while moving from, you know, from blue to red, fooling the shading off those a Kahlo's based on the temperature value. If we'll school below, will be able to see the least off available A column. At instance. Spell each category. You can copy and paste at the required Coloma and name like this one Plasma and Toby used in your chart, and we'll do it in a minute. If you are not true. How the Coehlo map is looking, then just keeps calling down to see those Kahlo's over here. So this is sequential column at, and each one is a different color map. Okay, many, many options. Each Kolobov represents a single Coloma, and we can see the name Hey, on the left side. Okay, for example, this one hot. It's a nice Coloma it to be used in some cases and will use it now. To be able to use a column up will just set a specific barometer called Seem Up, which stands Air four Carlo and map Now, without going into details on how I'm creating some child which will be covered later own. I just want to show you the result while playing with the Coloma arguments. So as a first step I will use the in my same up some color map chord inferno jumping back to the Jupiter lap to and you can see allocating some child that we will discuss later called X been. But the interesting things to show here is the this barometers seem up and I'm using a specific Coloma instance. Here we have it. Let's change the argument to some sequential Coloma up so it will be different. Kahlo map, which is sequential with the green Kahlo's will get different results. Oh, using the hot Kahlo map is another nice example this one hood Okay, which column up will provide us, you know, a nice result ization option. So it's nice to play with that and select the best Coloma for the child that you would like to present 24. Section 04 - Trend and Distribution Charts: hi and welcome back. I hope that you managed to play around with the provided exercise from the previous section . As long as you will practice, it will get much easier for you to Kate, even complex. Just a that we'll see later on in this section, we're going toe Covell Charles. That will help us to visualize and discover trends and also distribution. Parton's in a data set for the trend category. I'm going to present the line Child area Child Stacked Area, child. All those charts are popular tools and very useful to visualize two and even three new medical valuables on a two dimensional space. Then we'll move into a little bit more complex category, meaning distribution charts. It's a very important category to understand the nature off our data set how the data is distributed, the distribution shape off a single numerical valuable and also to compare the distribution off multiple valuables. In that context, we will cover the East Oh come child density, Kelche Art books and we Skill Child and finally base Warm Child. I think it's a very interesting section, and most probably you will encounter charts that you never used before. Even with out of the books Visualization tools. As always. At the end of the section, you have a summit exercise and a solution for that exercise toe practice. If you have any questions and then please use the course dashboard. Okay, This is our high level road meant for this section. Good luck and let's get started. 25. Line Chart: We will start at the section with the well loan line. Child. A Line chart is a type of chart that displays information as the cities of data points called mark ALS, connected by straight line segments from left to right to demonstrate changes in value. And also it can be used to observe the slope off the lines and moving up and down. I will start a by importing a group affair libraries, my publicly observable numb pie and pandas. We will use all of them as the first step. Let's draw a simple a linear graph. Using the plot method, I'm using a function in UMP. I called a line space to create evenly spaced numbers over a specific interval. Here we have it, so between zero and two it would gate 100 numbers. Using a variety off arguments, we can customize the line. I will add two more lines with the same ekes axes values, but with different Y axis values. And instead of using linear growth will use a a different mathematical functions so you can see each line represented by different Coehlo. Different lines style different Malco. You can play with all those parameters when creating a line chart on the horizontal axis, we need available that present continues values that every regular interval of measurement and therefore a typical use case off a line chart is to visualize a trend in data over intervals of time. I m. So if you have a time interval in the data set and this can be useful for presenting something on a line child, let's see such an example. Using a C bone as the first step, I will create a data frame with 100 time intervals, using a function called a date Underscore range in penned this for the X axis and 100 random numbers using Lump I for the Y axes Toby stored in a column called Value One. And after that, another, a column called Value to That will have the same random numbers. But with is some constant aviation. Okay, is the constant aviation. Let's see the results Data frame. As you can see, the data frame has three columns time value, one in value. To imagine the time represent a day interval. They want a two etcetera in see bone. The relevant functions to draw a line is called blind plot. We're providing the peaks. And why? Columns labels the name of the data frame and the eggs object we created. This is all the things that we already so before. Okay. Lying plot. Actually creating two lines, one with them value one in the other, one with value to Okay. Nice and simple. Example. A using generated A data points. Let's see. Another example. Form a building. A data sitting. See bone called. If I am I Okay. This is the data set. Now, without going into the meaning off the off that data set, assuming would like to present a line chart with the time point. A column. Okay, this one is the X axis, and the signal is the y axis. But I have some Some issue here. We're looking on the content off the data from We can see that the time A point column. There are no unique values. For example, the 18 value is repeating in multiple lines. Okay? It's not a unique value. How is C Bone is going to present it in that case, well, the different behavior and see bone is toe aggregate the multiple measurements at each X value by plotting the main calculation. Okay, so it will take all 18 value our gate them and calculate the overage, but will also present the 95 present confidence levels around the mean. Let's see that together. This is a great example. Why see Bone is more useful compared to much. Brooksley Born would like to visualize statistical data. All those calculations are done by sea boned aggregation. The calculation off the confidence level. We can switch the 95 confidence interval with standard deviation interval, and that can help us to see the spread off or the distribution off at each a time point. So I'm doing that by a playing with specific barometer called C I, and adjusting that to the value as the standard deviation. We can also explain eat a lines if we have some additional categorical available in this data said there is a column called Event S O. I will check the unique values in that categorical valuable we have to values. Now let's add this categorical valuable using their you semantic so that's going to be equal to event and I'm getting two different lines Village Different group and lastly, in order to better distinguish between the two lines we can use also where other key would argument like the market in style parameters. Let's play with that for a second. Okay? I'm adjusting them market to be true. I'm using this as input value to the style, the event column. So that will be the result. Each line has different market and different lines that that's it about line chart. In the next lecture, let's talk about area child. 26. Area Chart: an area chart is similar to a line child, except that the area between the X axis and the line is filled with Kahlo. Some sort off shading it will be used toe represent devolution off a numeric valuable. Let's display two lines using the plot function, as we have done before. This is a, um, a line chart. Now we can feel the area between each line using a function called Phil underscore. Between the genetic syntax would be to fill the area between two resentful kills. In this case, we can decide that one result, all caves, is basically why equal to zero, which is the tired argument off this functions. Let's see how I'm using that. So this is the function feel underscore between and feeling is between this value, which is the line itself, and the the other line is equal to zero, which is the X axis. Let's see the result. Here we have it. This is an area child. Another option would be to drove the area between the two lines instead off, using they y equal to zero as the baseline. It's also some a small tweaking that we can perform, so it's going to be between at this line and the other line below. Here we have it. I'm following the area between the that. That's another option that we can consider overhaul. I prefer to use the area child instead of the line child because from a visual perspective , it's nicer. 27. Stacked Area Chart: Another option to consider will be a stacked area child, which means few charts that are vertically on top off one another rather than overlapping with one another, as we saw before in a simple area chart as a simple example, I will take the to a linear egg graphs and stuck them on top of each other in mud quickly. We have a dedicated function called stack. Plot will use it, and the first argument is the X axis and there and then we can provide the list off. Where? Why, arguments? So this is very easy to create. As the first step, I will create the X axis and then tree lines line one line to line three line one will be linear. Lying toe will be constant value and line tree will be and other a linear. Now I can use this tuck float a function. The syntax is very easy, stuck float and then the peaks as a baseline and all the Y values that I would like to use . There we have it. The area stuck child, as you can see the orange middle area as constant value but has a linear slope because the baseline is the first line. Okay, the blue, the blue land. It can be a little bit confusing to someone. That is not, if some familiar with the idea about area stock chart because they are stacked on top of each other. Let's use the cubic line combined with the two linear lines, but this time the cubic line will be the first rough. Okay, so I have just off those labour cubic linear one linear, too, and I'm using the cubic as the fist the 1st 1 It will completely change the look and feel. Let's use another example. We will define a the amount off revenue generated by three product lines in each month for period off 10 months. So I have one dimensional array four months that will be the baseline product, one productive product it re. Each value here represents some revenue. And let's use the stuck float in order to present this information. As you can see product free, this one is generating the same constant amount off revenue per each months, as is looking on the value over here 111 always the same value. This kind of graph is useful. When would like to track the total revenue off all those tree product, but at the same time understand the breakdown. Okay, meaning how each product contributes to the total revenue. However, we should take into account that this kind of chart may confuse some people. For example, even though product a number three as a constant value, it is not a presented here as a straight line because the baseline is product to, and the baseline off product to is product one. Okay, you need to understand how to read such kind of chart. 28. Histogram Chart: from this point and moving forward until the end of the section, I would like to talk about distribution charts, meaning how to visualize the distribution off numerical valuables. We will start with the famous ist o gum child. Hey, he stole gum is a child that groups no medical data into segmented columns that are court beans. The beans represent rangers of data, usually a fixed range, like the number of people between the age off 10 to 20 and then another being will be all people between 20 to 30 etcetera. In this example, the bean range is 10 years the size off the bean. Okay, the height of the been will be proportional to the frequency off data points with a value within the being range. Let's say in a specific data set, we have 200 people between 10 to 20 so the size of that being will be 200. And then we have 100 people between 20 to Terry and 50 people between Terry to 40 and the last 10 people between 40 to 50. Okay, we just divided the data set into Gupte ages, Toby presented as beans. And this is the main idea of beans, then those beans are used to understand the distribution of the data set. Such kind off aggregated information can help us to understand how the data is distributed , that shape off the distribution, the central tendency and the amount of variation in the data. It gives a rough sense off understanding about the Underline data distribution. Let's load the Titanic Day to sit is the first trip here we have it. I will remove any lobes with missing values in the H column. Using the drop in a function is something we saw in level too many times. Now I will use the hist function in mud through clip to draw a historic I'm using the H column. As the valuable would like to investigate A for creating an instagram. The first step is to be in the range of values, meaning dividing the entire range off values into a serious off intervals and then count how many values fall into each interval. We need to decide, of course, the number off beans who would like to use. I will start with five it beans. Okay, so this is the function name hissed are providing the Titanic age is the column I would I would like to investigate. This, of course, must be no medical value, and then is an argument to the beans. The number five. We can see here that most passengers are between the age off 15 to 45 something like that. But maybe we're missing the main Parton's because it is a small number off beans, so I will change it to 10. Present it again. But this time we tend beans and also try 20. It's a different way, too. Divide the same numerical valuable. What I will do next that will grow in one figure several Easter grams with different being size okay, too easily to compare between them. And I will explain you. What is there idea around that? So that will be the court I'm creating 212 Figure to those two columns and all the object will be stored here. So aches is now two dimensional array and I can access and that list off objects. So this is the first object 00 and then 0110 and 11 Okay. And this is the way to access each a plot over here different. Been size for 5 10 28 Terry. It's easy to see the result of all those options, So I have a few comments about the number of bits. There are some tools that can automatically calculate and select the bean boundaries, but in most cases really to play around with this parameters to improve the chart, the main idea will be to try different being sizes, as I just done toe, verify the outcome is making sense and can afflict the underlying a distribution if the bean size will be too small. Meaning I asked a large number off beans. Okay, the being range is small. Then they still come will be visually very busy. Many beans, as we can see with the last one on the right side, below 30 beans or the other. And if the bean size will be two large meaning large range with a small number of beans, as we can see on the left upper side in the red Coehlo, then smaller variation in the dissolution will disappear. Okay, will not be able to see them. Okay, so we need to play around a little bit with the bean size toe bring some sense toe this chart. Keep in mind that he stole Graham's are sometimes confused with Bart Shot that we used before. Okay, The bar rectangle is similar to the Benedict angle, but they are used for different tusk. Hey, still, gum will be used for presenting continues data on the horizontal line where their beans represent rangers off data is we just see the ages on the other end. A bunch out will be used to represent a categorical valuable with a limited number of possible specific values, as we saw before. Like sex that can be only male or female. It's not used to handle a continuous no medical viable. For that we have days don't come chart. 29. Density Curve Chart: checking the distribution off. No medical valuables in a new data set is probably one of the first task we shall perform. We just saw how these toe come charge would be used to visualize a distribution off a numeric valuable using several frequency beans. Each being will cover a range of numbers and if will connect the beans with the line, then we'll see some kind of high level distribution. But the shape of the distribution will be affected by the number off beans. Well, we will select if we select five beans or 20 beans, then we'll get a completely different shape off the distribution. However, in some cases we would like to see much smoother view of the actual shape of the distribution. It is like translating the data points into some mathematical function, and this is what we can do with a density chart in a density chart. We try to visualize the underline probability distribution of the data by drawing an appropriate a continuous killed. This killed needs to be estimated from the data somehow, and the most commonly used method for this estimation procedure is called kernel density estimation. In short, Katie such kind off statistical estimation is reliable. And when we have a large data set, if the data cities relatively small, then we should use esto gums moving to the practical side to create a density chart. We will you see bone using a function, a cord dust float. In our example, we'll use the age column from the Titanic. Okay, data set. Here we have it. So it's actually a combination off to Charles that he stole. Come chart. Combined with the density child, the bean size for days to Graham was calculated automatically by sea bone. And we can play with such argument if needed, like we have done in the clip. Anyway, We would like to talk about the density child in this lecture so I wouldn't remove days toe come by, setting the hist argument toe false so it will be almost the same. But this time hissed equal toe false. It's run it anyway, ever. This is a density it child. I actually think it is looking better to draw the area below the line because it is presenting probability. Okay, that that's the why access. So that is another function called Katie E Float to draw shadow below the line. Okay, This is our density distribution chart. The white access, which is decayed. The line represents the probability off a new data point Toby located in a specific location along this X axis, it is a percentage value. So if you will some that area below the child using an integral, the result will be a 100%. A higher line in some location along the killed indicates a higher probability off. Seeing a point at that location, we can see here that it looks like a bell distribution with the means age around. Clearly, something like that and other options to consider will be to present two or more distribution at once on the same figure. And then city child will be better selection than an instagram chart in such a scenario. And as an example, let's present the male age distribution compared toa the female age distribution. So it will be something like that first awful. I'm creating some filter a data frame with the relevant group and then I'm plotting to a K D plot with each one off them. Okay, there's the Merkel is the female group. And let's see that together. By the way, we should cut the age range on the X axis because it doesn't make sense that the age will be with a negative value or could below zero. We need to remember that the K D is basically a statistical calculation to create such an estimation line, so it may produce a line with points that are not in the data settle, maybe are not even possible in reality. So in this example, I will use the set underscore eggs bound to start the X axis. A form zero has a simple solution. This line expound zero. That's it. This is much more clear right now. This is the density plot. And also we saw they Stockham plot, which is useful toe present distribution off a single numerical valuable. But what if we would like to compare different numerical valuables? And the way to do that is using the next type off chart that's called Box and Whisker and we'll see that in the next lecture 30. Box and Whisker Chart: in the context off analyzing no medical distribution. We covered ist o gram and that's it. Take care of Charles. Each one of them is useful in analyzing the distribution off numerical available in one group, but sometimes we need to compare several distribution against each other like the distribution of the same numeric valuable. But for multiple groups for such use cases, we can use the box and Whisker Child, also called the short box plot. It's a little bit more sophisticated child, unless popular. So I assume that some people are not even aware off using such kind of tool as a simple example. Let's load the Irish Day to sit and then use a function court box bloat in Seaborne. I would provide it with a single column. This is the column that I would like to use. The result will be one books which represent the data distribution. Off that Columns is one group, but it doesn't make sense to present one group using this child because we want to use this chart to compare multiple groups. So if I will call that function again, But this time, without providing the column label name, it will automatically takes all numerical columns. In that data set, each numerical column will be presented using a dedicated books. Okay, so this is one column that another common. We have four columns in the iris data set looking on the resulting on the x axis. We have the four columns in the data sets, which represent the size off the Irish flower in four different parameters. Each one is described using this book's A structure. They're why exes is similar to all of them, meaning the size in centimeters. The idea will be to compare groups through their books structure and the marking a position on top. If it's now, let's talk about the meaning off the books structure. This is a genetic structure off that box, the 100% a data The one vertical books is plotted in a way that the middle 50% off the data points fits inside the books, So 50% of their points are located inside the box, which means the bottom 25% of the data points located below the books and the top 25% data points located above the books. So we can say that the book size represent the variation in the data points. If 50% off the data points are around the same values, the boxes will be small. Okay, it's look like a small rectangle, but if they are spread around, then the books will look like a big rectangle. Now move going back to our example with the Irish flower. We're looking on the safe file a with or get this one, which is the second books. The box is very small, meaning those 50% values around the same small range, with less variation in the data set regarding this column on the other end in the Netherlands, this one. There is a lot of variation between the data points. Okay, the size of the box is looking like a big rectangle. So that's the first thing that we can easily understand. Compare between the full A columns and understand the distribution of the data. Secondly, the line in the middle is the medium. This line, which means the value that is separating the higher health from the lower half of the data , is simple. When a data distribution is some metric, we can expect the median Toby in the almost exact center off the books. Okay. On the other hand, if this distribution is not symmetric than the median will be, not be in the middle and instead located up and down off that books. So if I'm looking on their sinful length and the median is almost in the centre off the books, meaning the data distribution is some metric on the other end looking on this one, the 1,000,000 is located at the top of the books, which means the data is not same metrics. That's the meaning off the line in the middle. The next one is the vertical line, or so called the whisker extends from the books. Okay, up and down. It is used to capture the range off the reminding 50% data and also to show the maximum value and minimum value. And the last piece of information is the dots above or below. And the whisker ages toe indicate out Liar, meaning that the points that are not fitting well with the overall distribution. And when looking on the second books, of course, there are outlined below and above the whisker edge. Keep in mind that with the book's plot, we miss out the ability to observe the detailed shape of the distribution. Because we are presenting that in a high level. I can, of course, use their density child to get more detailed information about the shape off the distribution off a specific group, as we saw before. In that context. OK, when using a book's plot. A Child. Another alternative to consider is called Violin Chart. Let's roll in, see bone Ah, such out great there. This is the name off the function. The idea off a violin protease to combine a box plot with the density plot. It's looking something like that. This chart is almost similar to the books child concept, but it will replace the books shape with the kennel density estimation off the underlying distribution off both sides. Okay, this line is Katie year and also the other side. In many cases, the shape off the distribution is looking like a bill like this one, and therefore it will look like a violin. Okay, if you combine both of them, this kind of charges, any another piece off information consolidated toe a single charge, The only things do we concede there is that this kind off chart is not so commonly used, meaning most people will not know how to read it. So we need to keep that in mind while using this chart. Let's see more examples from a different data set. I will use their tips a data set here. I would like to analyze the total bill distribution compared between the days days in the week I need to provide a The date column is the X argument at the Total Bill column as the Y argument and the fasting will do. Present a box plot. Here we have it. Four groups as full days. We can see how the median bill is going up while they're entering the weekend. What about comparing the same total bill distribution A bay but divided into additional subgroup like smoker. Okay, The person that I mean smoking is if it's ah, there is a least one person in that group that is smoking in See bone Will we can use the the Who M barometer that is helping toe detailed one it valuable. So that will be their values, their smoker column. And we have it. The day is still the baseline, but this time I'm adding another subgroup. Yes. Oh, no, As the smoker pair each one of them. What about comparing the total bill distribution pair the number off people having dinner together? All I need to do is to place the size column as the X axis and the the total bill is there . Why Column? This is the only thing that I need to do, and that's it. As you can see, it's a very powerful tool and also very simple to use in Seaborne. 31. Bee-swarm Chart: the last type of child I would like to present in this section is a very nice alternative to a book's child, and it is called Base Warm Child. A base one child. It could be used toe, emphasize individual data points in a distribution. Instead of binding them like a history, Gamow summarized them into a box and whisker a structure. I can show the distribution off a single numeric metric across one or more categories. It's a great tool when we want to show both the overall distribution off available and the individual a data points that are being used toe build the distribution. So here we have it. Peace Warm is a one dimensional child, and that shows all the information on a single axis using the X axis. Okay, you can see per each day. This is the structure. It displays values as a collection off points, A similar to a scatter child. The difference is that the base one child is using some kind off offset logic for ensuring that data points are bloated, not on top of each other, but close to each other. Okay, not overlapping. Okay, so it will take the data points in this level and spread them around. This kind of chart is very useful when we want to display a lot of data points at once. Additionally, it makes it easy to spot out Lyle as they will not be part off this warm. Okay, for example, those looking like out liar outside. Off this warm off data points, as with other charts, we can add another categorical valuable. For example, let's at the sex column as another type off information. Now they are being displayed. Is different clothes for male and female? Well, a combination off swarm float together with the violin, a plot all together. Here you go. 32. Section 05 - Correlation Charts: hi and welcome back in this section, we're going to cover charts that will help us to visualize and discover a variety of coalition partners in a data set In Essence Coalition is the measure of how two or more valuables are related to one another. Now, why is it important to understand the correlation partners in a data set? Well discovering coalition partners can be very useful in many use cases. For example, they can indicate a predictive relationship that can be used in a machine learning project . If we know that the weather has a stone Gleneagle correlation to the electricity demand, it can be used to predict the demand. Based on the weather focused prediction is that a coal practice use case off machine learning a correlation convey be linear or nonlinear, positive or negative, strong or weak and also a variety off combination like strong, positive linear correlation. As you can see him on the Left Apple chart, we're looking on the right upper chart. The partners off boats is looking like a slope from the upper left to lower right, which indicates a negative correlation and because the points are a little bit spread around it is considered Toby a moderate, a negative linear correlation. They left Lowell charts, has no clear coalition a relationship. It's just plaster off points, and the right child is an example off a strong, none linear correlation. We'll see it later. But if the points are also coded using COEHLO shape or size than one additional valuable can be displayed on the same a scatter chart. We are going to learn how to create, customize and use. Scatter Chart. A Cola Graham chart hit Mup and Ex Been Came Up is always at the end off the section you ever summon exercise and a solution for that exercise toe practice and practice. If you have any questions and then please use the course a dashboard. It is our high level Rodman for this section. Good luck and let's get started. 33. Scatter Chart: the 1st 1 under this category is the scatter child. A scatter chart is typically used to present relationships between two numeric valuables. By using simple dots, the data are displayed as a collection of points on two dimensional space. The position off each dot on the result all and vertical axis indicates values for an individual data point. This kind off chart is used to suggest a variety of coalition options between valuables with a certain confidence level. By presenting the points as a group on two dimensional space, we may find a meaningful relationship between the valuables. Let's start by loading the tips a data set. Now I would like to check a possible correlation between the tip valuable and the total bill viable both of them on a numeric valuable in a month clip. There is a function called scatter to create that kind of child. Okay, this is the scatter and function and providing X and Y and his data the data set tips. It looks like a moderate, positive linear correlation between the data points, which makes sense because in many cases the tip in a restaurant is proportional to the total a mil price as we saw before in other plotting functions, we can use a variety off arguments to change the market. Coehlo style. It's a trust, so I will do some kind of customization. Changing the market in the Coehlo and other option to consider will be 12 attended valuable toe the child using additional visualization adjustment values off the table valuable can be encoded by modifying how the points are floated, like changing the Coehlo shape or size off data points. It can be a categorical, valuable oh numerical, valuable, as as the tell them valuable that would like to add. The most common encoding is by using Kahlo's giving each point. A distinct color makes it easy to show membership in each point at two. Respective group, Let's see how it is done in March, Brooklyn. So festival will display again today to set. We can use their this column the Sex column as additional categorical valuable on that child. We know that there are two groups of passengers, a male and female, and let's present the points with two different Kahlo's, according to the sex category. As the first step, I need to create a list of Kahlo's with the exact size off the column in the data set. Meaningful every point or avoid the data sets the relevant Coehlo 11 to We had a dedicated lecture about creating dummy valuables, and over there we learned how to use a function called map. I'm going to use it here to map the gentle Type A to Kahlo's so the syntax will be close, equal to tips and with the sex column. And then I'm using the Met function to met the male toe. Discolor black and female to read Let's present it. As you can see, we created a pandas serious object with the list of Kahlo's Every cell got black polo value . If it's a male and red for female now, we can use the least as an argument for this cattle function using the Coehlo argument. So that will be the syntax I'm heading. Call of Equal Toe Kahlo's. The Kahlo's City is dedicated now that we have two groups in black, polo is the mail and a red call over his female. Let's see another example using the size column. This is the number off people around a specific table eating together. So again, let's present the tips, and I'm talking about this column size As a first step, I will check the unique values in that column. The ill 123456 Unique A values meaning It's a categorical, valuable. And now let's use the map function to map each size. To some Coehlo, the syntax is the same. Just more options. One going to be black to pink tree green etcetera. Okay, here we have it looking on the charts. It seems the dominating group size is being culo, which, which means two people in a group having a dinner or lunch together. We can also play around with the size off each point according to the terror valuable. This is called bubble Child. A larger points indicate a higher value, So all we need to do is to use the following syntax again. I'm using the same Coehlo met pink, but this time creating another valuable court sizes and I'm taking the original size and just multiple that in some constant number 20. And then I'm using that as an input argument s which is showed to size equal two sizes. Okay, this one, because you can see the larger point and smaller points. In our case, the size column is more a categorical valuable with only six options. So color encoding is is more useful than making each point larger or smaller. An interesting, nice visualization option will be to use a coehlo that is changing according to the value off some excess. Okay, let's say I would like to grow the points in red Coehlo with different shading that is getting stronger based on the tip value. For that, I will use the combination off the sea parameter while using The TIP column has an input argument and also the class map parameter that has a pretty fine list off class maps. It's in the syntax. So I'm using an argument called C, providing the tip that will change a different value of the tips. It will change the Coehlo and the class map is red. It's presented. Okay, we have it starting with weaker Coehlo and the color is getting stronger. Using this class map. It is nice and useful toe at the polo ball using the Khobar function on the figure object level. Okay, not on the access subject level. Let's see the syntax off. Ending such call about so fig dot cola bottle and I'm providing to input parameters. And this I am, which is the attend object form the a x dot scatter and I'm providing the relevant aches. Object. Okay, to draw that right over here. And I'm using different class map. I would like also to quickly show you the equivalent corden see bone, which is usually more simple. And some of the things are translated and done automatically by, say, bone. For example. You remember that I wanted to present the tips and total bills valuables, and in the same time, divide a them. According to the sex column, I can do it with two lines. Okay. No need to use the mapping function. The tell valuable will be the argument off for the U parameters. So the syntax will be something like that. I'm creating the figure to access with some size. And, um, I'm providing this parameter. One time it would be sex, and the other time will be size, as we saw before. Okay, that's it. The upper chart is with the sex column as the tell valuable in the lower chart is with the size column is the terror Bob. Let's use the Irish data set, which includes more numerical available, so I will load it and present some simple scatter child looking on that chart. We may assume there is a linear relationship between the two variables, but still something is not making sense. It seems the left cluster off points is not related, so I will add the species categorical variable as the term a valuable too deep to be added . Okay, this is the suspicious valuable. We can see that each one from the two groups on the right side has some linear correlation . But there is no coalition inside the group on the lower left side. OK, in some cases, a Ted viable can ed substantial information to a scatter a child would like to use. 34. Correlogram: In the previous lecture we saw, a scatter chart is used to present two or maybe three new medical valuables. Is this classed off points? In many cases, we would like a quick way to check all coalition options between a group off numerical valuables in our data set, instead of creating many scatter charts. For that, we can use the Coello Graham child. And the Simple Library has a dedicated function called purport to create such type of child . And the first step I will load the Irish say they to set and then called that function. It's a nest, not fair plot. It will take a few seconds and then we'll get the result. Okay, Kolelas Graham chart is a symmetric metrics showing the relationship between each pair off numeric valuables off a data set. We have four numerical valuables in our data set. So it's phone for the relationship between each pale. Valuable is visualized through a dedicated scatter a plot. Each cell help us to understand the election ship. Between the intersecting valuables, for instance, we can see the linear relationship between the petal thence and the petal with using this s scatter child, The diagonal represents at the distribution off each valuable using his toe graham or a density float, for example, I can switches to a density plot in the middle by using this barometers. Die underscored kind equal to Katie, and now this is a density a float. We can also ask toe ad linear regression models toe the scatter plots using this one kind, equal toe rag. And now, for every a scatter plot, there will be an estimated linear line and see the Selenia line as we talked in the scatter plot lecture. It is a good practice to display subgroups if the categoric valuable is also available. So I'm using again the pair plot. But this time I'm adding this specific categorical valuable. Now the three different species are color coded in each chart in that metrics. Sometimes a data set will have more than five numeric valuables, and in that case, the metrics will includes too much information OK, making it less readable. We can consider kating the metrics with only specific column, So as an example, I will take only two columns. I'm using the valves parameters and providing the two columns is an argument, and I'm getting much smaller metrics 35. Heatmap: our next very interesting child type is called heat map. Hate Month is the data visualization technique to show relationships between two valuables , one plotted on each axis using Kahlo's. It's like a simple metrics that is divided into cells with the same fixed size spirit cell that they will be toe. Observe how cells Kahlo's change, Of course, each access to identify interesting Parton's their valuables floated on each axis can be off any type. It can be a categorical, available or numerical valuable. If we'd like to use a numerical valuable in one access as a continuous number, then it must be divided like we're doing When using a history Gramp. The sale coloring can be correspond told all kind off metrics like a frequency count of points in each bean or summary statistics like Maine or median off a valuable. We'll see that in a minute when creating a hit map. Typically, all the ho's are one category. All the columns are another category. The data contained within a sales based on the relationship between the two valuables. In the connecting goes and columns the cells either containing color coded categorical data , all numerical data that is based on a polo scale. See Bone as a dedicated function to create heat maps called him up. We need to provide it with two dimensional rectangle. A data set. It can be end the array object form a numb by or data from object from pandas. I will generate a phone for metrics. We turn with random numbers and then use it as an input argument for the hit map function. Okay, so I'm importing numb Pyin. See bone amusing numb pie to generate ah, random numbers off phone for metrics and then using the hit map and function and see bone with that data is an input and specific class mark to see some nice color. Here we have it. Phone four hit map metrics different Kahlo's that represent a value that the random numbers . Now it's a nice seat map by the values inside our random numbers, so I'm not expecting to see any useful Parton's in that hit map. Next, let's load the Titanic. A data set. There is a function a sparked off the data frame that can be used to calculate something that is called coalition metrics, which is calculating the pair wise collation off all numerical columns in the data set. Okay, Without going into too deep into the into statistics, the different mental off. Calculating the correlation is going to show the linear correlation between each pair of valuable. So the syntax will be titanic. A dot Cole can. I will get this correlation and metrics. It's a metrics off number. Now let's use it to create a nice hit map. I'm using the titanic dot Correlation and I'm also would like to use a notation inside, so we see the number of inside. Here we have it. This is a hit Mup presenting correlation matrix, which is which is a nice use case off a hit map. Every cell represent the linear, a correlation between a too valuable and off course. The line over here is goingto be one because it's the same valuable survived and survived. So one it's useful to see which valuables have stronger coalition between each other. Let's get the same collection metric seat map, but this time for day, I reach a data set, which is smaller number warfare features. Here we have it. We can see the petal lent is highly correlated with the petal with with zero point and 96 correlation coefficients. And also we can see that the petal lens is highly. Call it to the several lengths with zero point 87 Coalition equity features. Let's load it, the flights data set and check the structure. It's simple data set. We have three columns yield eight months and passengers and I would like to create some more complex hit my pay using this information. First off, all I will check the number off unique values Bill Ill. Okay, we have 12th years. And also check the number off Unique values. A pair months. We have 12 months in the data set. Okay? No, assuming I would like to build a hit map with the years as the X axis and the months as the Y axis and the call off each cell should be based on the number off. Passengers were looking at the data set. It's not really organized this way. So we will use a function in pandas called the vote to create a new table with needed structure. So I'm using that p vote him function toe, create the needed metrics. So the months will be the loaves and the you will be the columns okay, and the value inside will be passengers. Okay, let's see the result now I have each month is a low and each year is a colon and the value inside is the passenger. This new organized table now can be used to create a heat map. The data frame index columns Information will be automatically used to label the columns and rows inside the hit map desisting that being done automatically by see bone. Here we have it. We can also separate it the cells with some line. If you would like using some line with now looking on the bottom up on the left side, we can understand that brighter Coehlo in a specific cell means small passengers. And from the year 1955 until 1960 there is a growing number off passengers, probably because flights will becoming more affordable. I can guess. Also, the number off passengers is going between and July and August. Okay, people are going to vacations if that making sense. We can also add the exact number off passengers in each cell by setting day annotate parameters to true and using some a special feature former thing as input. Okay, Annotate equal to true. And I'm using a different equal to beef. And now each cell has the exact numbers. So this is the hitman. 36. Hexbin Map: we talked about the Scattered Child, which is used to visual points in two dimensional space. Each point location is based on two valuables ex and why we can also add another third valuable by following the points. This is something that we saw before in some cases, will encounter a situation that there are too many points that are overlapping each other, and we may easily miss important Parton's when using a scatter chart. It's a common situation when the data cities huge like data that contains hundreds or even thousands of data points. Let's get a simple data frame with 10,000 London points using normal distribution. Okay, so I'm using a numb by a condom and creating a data frame. Let's present the data frame. I have two Columns six and why, and 10,000 lows there. Let's use this data to create a scatter chart. Here we have it, as you can see there, so many data points that can overlap each other, and we can easily miss imported Parton's in the Data City food. Try to use a scatter chart. One option will be to filter some of the data points, using some smart filtering method all to use some alternative. It is what I would like to show you. In such scenario, we can consider using a different chart called Heck Spin in an X been child. The space off all figure is divided into logical hexagonal regions. They're all points using the X and Y values are aggregated in tow, their respective exceptional region with a Kahlo gradient indicating the density of points in such a exceptional area. Let's create such type of child. We need to define the number off Excellent, Al in the X direction using the great size. Okay, there is so first of all, this is the name of their function has been and I need to provide how many regions I would like to create using the great size. So I'll create a 40 x are gonna regions. This will be the result. As you can see compared to this catapult, this child is standing a different store using I calculated coehlo a gradient. The cola off each exceptional region is based on a metric we can select. The most common metrics is density. You know the number off points that are located in the same region, but it can be even attended valuable that we can select. For example, we can present the mean value off that it advisable. By the way, if the X and Y valuables that we're using to present that child will be latitude and long latitude pills, it can be even more interesting to see where they most data points are, are located. Keep in mind that we should play around with the great size barometers. The default value is 100. Increasing or decreasing the great size can potentially show us completely different information. In the data set, for example, I will reduce it to 20 and also a Coehlo map. Great and other option would be to follow the edge off each exceptional region. Okay, all kind of thing that you can play around according toa customization that would like to add. I'm using the edge. Coehlo with the great and weaken basically see in the structure off the exceptional regions 37. Let’s Recap and Thank You!: hi and welcome back to our last section in this training. I want to Rick up the things we covered so far. We started by learning the fundamentals off the to call visualization libraries in pie tone , meaning much booklet and see bone and the syntax to create charts in each library. As you saw during this training, we can create a large variety of charts using only to libraries, which is helping us not to get lost with too much options in a fightin. In any case, I recommend you later want to consider learning more data. His organization libraries as you get more experience doing the training we lend to create all kind off charts that I divided into categories like ranking proportion, trained distribution correlation. Each category class has a group off charts that can be used for different use cases. Each chart as a dedicated lecture under a section, and it will help you in the future. When you want to remember how to create a specific chart. You can just quickly review the relevant lecture again until this point, meaning level one level toe a Level three recovered quite a substantial number off topics we know the fundamental terms in machine learning how toe upload the data set for Macy s. We file using dependence libraries how toe fixed the structure off data frame clean and transformed the content inside addictive frame. And finally, how to visualize the data set in a variety off charts, which is a critical part of the data exploration and analysis process. We are ready to move on to the next exciting step and start to use machine learning algorithms. So level for and moving forward will be focused on practical machine learning use cases. I want to thank you for watching this training. I hope that you enjoyed it and learned some interesting things along the way that will be useful for you. It will be our some and very a useful and important for me if you can rate the course and share your experience. If you would like to continue a your learning path about machine learning, please check if lever for is already available. Thanks again. And I hope to see you in my next training course. Bye bye. And good luck.