Data Science with Matplotlib in Python | Lazy Programmer Inc | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Data Science with Matplotlib in Python

teacher avatar Lazy Programmer Inc

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction Video

      3:05

    • 2.

      Matplotlib Outline

      2:30

    • 3.

      Line Chart

      3:40

    • 4.

      Scatterplot

      4:21

    • 5.

      Histogram

      2:17

    • 6.

      Plotting Images

      7:31

    • 7.

      Matplotlib Exercise

      1:30

    • 8.

      Where to get discount coupons and FREE machine learning material

      5:31

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

45

Students

--

Project

About This Class

In this self-paced course, you will learn how to use Matplotlib to perform critical tasks related to data science and machine learning. This involves visualizing data with several kinds of plots, such as the line chart, scatterplot, and histogram. You will also learn how to plot images

The course includes video presentations, coding lessons, hands-on exercises, and links to further resources.

This course is intended for:

  • Anyone interested in data science and machine learning
  • Anyone who knows Python and wants to take the next step into Python libraries for data science
  • Anyone interested acquiring tools to implement machine learning algorithms

Suggested prerequisites:

  • Decent Python programming skill
  • Experience with Numpy

In this course, we will cover:

  • Matplotlib and how to visualize data with line charts, scatterplots, histograms, and how to plot images

Meet Your Teacher

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction Video: Hey everyone and welcome to my latest course, Data Science with matplotlib in Python. So who am I and why should you listen to me? Well, my name is the lazy programmer and I'm the author of over 30 online courses in data science, machine learning, and financial analysis. I have two master's degrees in engineering and statistics. My career in this field spans over 15 years. I've worked at multiple companies that we now call Big Tech and multiple startups. Using data science, I've increased revenues by millions of dollars with the teams I've lead. But most importantly, I am very passionate about bringing this pivotal technology to you. So what is this course about? This course is all about teaching you foundational skills using the matplotlib library, which has become standard in the past decade for doing Data Science with Python, you'll learn about how to make various kinds of plots, like line charts, scatter plots, histograms, and even how to plot images. These skills are critical if you want to do data science and visualize your data and your results. So who should take this course and how should you prepare? This course is designed for those students who are interested in data science and machine learning and already have some experience with numerical computing libraries such as NumPy. The second skill you'll need is some basic programming languages, fine, but since this course uses Python, that would be ideal. Luckily, Python is a very easy language to learn. So if you already know another language, you should have no problem catching up. For both of these topics. A high-school level understanding should be sufficient. In an undergraduate understanding would be even better. So in terms of resources, what will you need in order to take this course? Luckily, not much. You'll need a computer, a web browser, and a connection to the Internet. If you're watching this video, then you already meet these conditions. Now, let's talk about why you should take this course and what you should expect to get out of it. Well, what I've noticed after teaching machine learning for many years is that there is a huge gap in knowledge. Students come to a machine-learning course wanting to learn machine learning. And they'll understand the concepts, but then have no idea how to put those concepts into code because they don't really know how to code. This course is intended to fill that gap by creating a bridge between a regular coating and the type of coding you'll need for data science and machine learning, specifically plotting and visualizing your datasets. By the end of this course, you'll have learned enough to go out and use what you've learned on a real dataset. In fact, this is what we'll do as our final project. I hope you're just as excited as I am to learn about this amazing library. Thanks for listening, and I'll see you in the next lecture. 2. Matplotlib Outline: In this lecture, I'm going to introduce you to the next section of this course, which is on Matplotlib. Matplotlib as a library we will use to visualize our data. This section will be quite short compared to the num pi section, since in your investigation of machine-learning algorithms, there are really only a few plots you need. This is not about building reports or presentations or anything like that. We're interested in plots that will help us specifically with the implementation of machine-learning models. So that being said, what are we going to cover in this section? First, we're going to talk about line charts. Line charts, despite its name, are used to plot any kind of one-dimensional signal. So e.g. you might want to plot a time series like the stock price per unit time. Another example of that is a sound wave. So e.g. you might load in and MP3 and look at how the amplitude changes over time. Musicians look at line charts like this all day, except when you look at it from the perspective of a musician, the lines are so close together that you can't see the individual values at each timestep. Next, we're going to look at a very important kind of plot, the scatter plot. The scatter plot. Let's us see our data from a geometric perspective. In fact, when we think of, say, a classification problem or a clustering problem, we often think in terms of geometric picture is just like the even when you have data which is very high-dimensional, that is data you can see we still try to find ways to reduce the dimensionality so that we can see it on plots like this. Next we're going to look at another important plot, the histogram. The histogram is important because it lets us see the distribution of our data. And of course, by distribution, probability, distribution, machine learning algorithms are often defined using the language of probability. So being able to look at the distribution of your data to determine what kind of distribution it has is very important. Finally, we're going to look at how to plot images. Images are a very important type of data in modern machine learning. The field of computer vision has exploded thanks to deep learning. And thanks to these advances, we are closer than ever to self-driving vehicles and autonomous robots. Part of this also is going to be answering the question, how exactly is an image represented inside the computer? As you'll soon see, it's nothing we don't already know about. 3. Line Chart: In this lecture, we're going to look at how to make a line chart. First, we're going to start by importing the necessary libraries. So for this section that would be import numpy as np and import map plot lib dot pyplot as PLT. Alright, so next we're going to create some fake data for the x-axis. By the way, even though this section and the following sections are not about NumPy directly, this does not mean you won't be exposed new NumPy features as we go along. So that's exactly what we're gonna do right now. So x equals np dot linspace 021000. So what does that do? That creates a one-dimensional array with 1,000 evenly spaced points, 0-20. Next, we're going to generate some fake data for the y-axis. So let's say y equals np dot sine x plus 0.2 times x. There's nothing special about this function. It's completely arbitrary. Next we plot our data. Hey, we can accomplish that by doing plt.plot x, y. Now it's important to note that if you are not using a notebook, whether that's colab or Jupyter notebook on your local machine, you will not see anything from just this one line of code. Instead, you need to call an additional function, which is plt.show. Let's try that. So that would be plt.plot x, y, plt.show. As you can see, it doesn't really have any effect if you are already inside the notebook. Next, we're going to add some information into our plot. So let's say I want to label the x-axis. So first we're going to have our plot. So let's plt.plot x, y. And then in order to label the x axis, I can do plt.plot x label, and let's just call it input. Similarly, if I want to label the y-axis, then I can do plt.show y label. And let's call that output. Finally, let's say I want to add a title, then I can do plt.show title, my plot. Okay, so let's try this. And we get our title and our y-label and our x label. Now you notice something weird, which is that notebook always prints out the last thing that was returned by a code. So in our case that was this. Since the PLT title function returns something, we see that in the box below, although we probably don't want to. So one thing we can do to suppress this output is to just end the line with a semicolon. Let's see that. Okay, and now that output disappears. 4. Scatterplot: In this lecture, we're going to look at another very important to kind of visualization, the scatter plot. Of course, in order to have a scatterplot that actually looks like something, we're going to need some random data. So let's start by creating some random data from the standard normal with shape 100 by two. So that's 100 observations. And a two-dimensions. We do x equals np dot random dot rand n 102. Now you might be wondering why 100 by two. This gives us 100 data points with dimensionality to, we could have chosen 500 data points, but 100 is just fine. But the two is necessary. Remember that computer screens themselves are two-dimensional. And so in order to specify the x coordinates and the y coordinates, we have to have two-dimensions no more and no less. There were also three-dimensional scatter plots you could make. And we do have that in my other courses, but that would be outside the scope of this course. So in order to make a scatter plot, we call plt.show scatter. So that's PLT scatter. I'm going to pass in x colon zero and x colon one. So x colon zero is the first column of X and X column one is the second column of X. And just to be clear, the first argument here corresponds to the horizontal axis, and the second argument corresponds to the vertical axis. So let's try this out. Alright, so we get our scatter plot. Now in machine learning, often we're interested in classification or clustering, where we would like to draw scatter plots of data points with different colors signifying the different classes. So let's see if we can make a scatter plot like that. First we're going to generate some random data, again with dimensionality to let's say x equals np dot random dot random 200 to next, I would like half of this data to be centered at some different location. Remember that the random function draws from the standard normal. So by default, all these points are currently centered at zero. So let's say I want the first half of these data points to be centered at three-three. To do that, I can just do x colon 50 plus equals three. So the colon means select all the rows from index zero to index 50. And a plus equals three means add three to all the elements. Next, we're going to generate some labels. Let's start by creating an array of zeros of size 200. So the y equals np.zeros 200. Next we're going to set the first half of these labels to why this is so that all the points centered at three-three have labeled one and all the other points will have labeled zero. We can accomplish that by doing why colon 50 equals one. And finally, we can draw our scatter plot. So this is almost the same as before, except now we're going to also use the C argument and pass it and why? And obviously C stands for color. So PLT scatter x0x1c equals y. So sometimes it's hard to see your code because these things pop up to show you the API. And so the way this works is the thing you pass in for C should be a one-dimensional list or a one-dimensional array containing integers corresponding to how you want to color the data points. So let's run this. And we get our colors scatter plot. 5. Histogram: In this lecture, we're going to look at another essential plot, the histogram. Since histograms are used for plotting the distribution of data, we're going to need some random numbers. So let's start by creating 10,000 random numbers from the standard normal. So X equals np dot random, the random 10,000. Next, let's plot a histogram. So if you've ever programmed in another scientific language before, you may have even guessed this yourself. It's just PLT dot hist x. Alright, and that's our histogram. Now, when we have this many data points, it's possible to get a better plot by having more bars. We can accomplish that by using the bins argument. So we can do plt.plot x bins equals 50. And this is of course, one way to confirm that our data truly does come from the standard normal. We have a bell curve that is centered around zero. And approximately 95% of the weight is between plus or -1.96. And obviously I just made that up. I can't tell by looking at it. Now, just out of curiosity. We may also want to confirm that the random function samples from the uniform distribution. So let's generate some data from the random function. So that's x equals np dot random, random 10,000. And let's draw a histogram. So plt.plot x equals 50. Alright, so this is pretty flat, although not exactly, I'm sure it would look more flat if you had less bins. But either way, it's pretty clear that the random function does indeed sampled from the uniform 01 distribution. 6. Plotting Images: In this lecture, we're going to discuss how to plot images. Now it's useful to mention that many famous machine learning datasets, such as m-nest and C4 ten or image datasets. But these datasets are not stored as actual image files, e.g. on your computer, you may have images that are in the JPEG format or the PNG format. But some machine learning datasets are stored in different formats, such that the entire dataset with multiple images can be stored in a single file. For us, in this lecture, we will be concerned with just single images like the ones you might have on your computer. So to start with, we're going to download an image from the Internet using the W get command. Now of course, you don't have to choose the same file as me. You probably don't want to type this URL out by hand. So as part of your exercise for this lecture, please find your own image and get the URL for that image. You can pause this video until you found an image that you want to use. I am going to grab the URL from my pre-written notebook. Alright, so you can see from the output of the W get command that we downloaded a file called the Lena dot PNG. So next we're going to use a library called Pillow, which will help us load in our image. So let's import pillow by doing from PIL import image. Next we're going to use pillow to load in our image. So for me that I am equals image dot open Lena dot PNG. Now, although this is the num pi stack, this return value is not a NumPy array. We can check the type of this object to confirm that. So we just do type. I am. And we can see that it's a PNG image file. Luckily, it's very easy to convert this into a NumPy array. We can just do it like how we convert lists to NumPy arrays. One array equals np dot array. I am. Now the reason why this works is because the images are represented in computers as arrays. If you think about it, an image has two-dimensions, height and width. For each location along its height and width, it has a color value. So that's exactly what this array is. It's a box of numbers. We can print the array to confirm that. So just do ARR. And that's our image represented in a computer. Now, there's something interesting about these numbers, which you'll learn about more if you ever take a class with me on a computer vision or image processing, all these numbers seem to be integers rather than floating point, and they are all 0-255. If we scroll down to the bottom, we can see that the D type of this array is un eight. That is, the numbers are eight bit unsigned integers. So it should make complete sense to you that these numbers are 0-255, since two to the power eight is 256, and therefore with a bits that is the total number of possible integers we can represent. Let's check the shape of our array. So that's ARR does shape. So interestingly, it's a three-dimensional array of size 512 by 512 by three. So what do these numbers mean? Well, the first two-dimensions, or the spatial dimensions, they are at the height and width of the image. But why is there a third dimension of size three? That's because for each location in the image, we need to store the color of that pixel. And it just so happens that colors are stored using three values. Specifically, these are the red channel, the green channel, and the blue channel. These three numbers tell us how much red, how much green, how much blue to combine to make the color at this location? And by the way, just so you know, they teach this stuff in kindergarten. So if you're feeling confused, then tonight you might have to ask your kids for help with your homework. So how do we plot this image? Well, we use a function called IM show. So let's do that. That would be plt.plot. I am show ARR. As you can see, this is the famous Lena image used in every computer vision course and existence. And by the way, this also works with the original image. So we can do plt.plot Show. I am. Now one thing we often do in computer vision is we work with grayscale images, also known as black and white images. One simple way to convert a color image into a black and white image is to take the mean across the color channels. Let's try that. So that's gray equals ARR dot mean axis equals two. And if we check the shape of our new array, we can see that it's 512 by 512, which means we've collapsed the color dimension, which is what we wanted. So what happens if we plot this image using IM show, do plt.show. I am show Greg. That's interesting. It seems we've been given a weird set of colors, so weird mix of green and yellow. Now it's important to note that these are not actual colors stored in the image itself. These are all just numbers 0-255, so it's not like 255 is green and zero is yellow. These colors are actually decided by matplotlib. If you were using a different programming language or even a different version of Matplotlib or Python. These colors might come out differently. This is basically what is called a heat map. So you actually just learn how to do two things at once. But still we would like to know how to plot this grayscale image, actual gray scale. The way we can do that is by using the CMAP argument. So let's try that. So that's PLT the IM show gray CMAP equals Greg. And as expected, our image has been plotted in grayscale. 7. Matplotlib Exercise: In this lecture, I'm going to give you an exercise to practice what you learned in this section. Your exercise in this section will be to generate and plot what I call the generalized XOR dataset. So why do I call this the generalized XOR? Well, if you have a computer science or engineering background and you have already heard of the XOR. It is a logic gate. It does a logical operation like the AND, OR and NOT gates. We can write down the XOR using a truth table where X1 and X2 are the inputs and y is the output. If x1 and x2 are both zero, then y is zero. If either x1 or x2 is one, but not both, then y is one. If x1 and x2 are both one, then y is again zero. The reason we call this the XOR, which stands for exclusive OR is because it differs from the regular or operation in that the last row would be at one with the regular. Or. Now of course, if we plotted this, it would just be four dots, which is not that exciting. It looks a little better, is randomly scattered points, such as what you would see in a machine-learning data-set. So if we split the data into four quadrants, then the upper left and bottom right corner, we'll have one color in the upper right and bottom left corner, we'll have another color. Your job is to generate this data and make a scatter plot like what you see here. Just to make it a little harder, notice that these quadrants are defined between minus one and plus one, not 0.1. Good luck, and I'll see you in the next lecture. 8. Where to get discount coupons and FREE machine learning material: Hey everyone and welcome back to this class. In this lecture, I'm going to answer one of the most common questions I get. Where can I get discount coupons and free deep learning material? Let's start with coupons. I have several ways for you to keep up to date with me. That absolute number one, best way for you to keep up-to-date with newly released discount coupons is to subscribe to my newsletter. There are several ways you can do this. First, you can visit my website, lazy programmer dot. At the top of the page, there is a box where you can enter your email and sign up for the newsletter. Another website I own and operate is deep learning courses.com. This website largely contains the same courses as you see on this platform, but it also contains extra VIP material. More on that later. So if you scroll to the bottom of this website, you'll find a box to enter your e-mail, which will sign you up for the newsletter as you would on lazy program at DOT ME. So you only have to do one of these. Now let's do a small digression because this is another common question I get. What's this VIP material all about, and how can I get it? So here's how the VIP thing works. Usually when I release a course, I'll release it with temporary VIP material, which is exclusive for those early birds who signed up for the course during my announcement. This is a nice little reward for those of you who stay up-to-date with my announcements and of course, actually read them. It's important to note that VIP material can come out at anytime. E.g. I. Couldn't make major updates to a course three years after starting it and do another VIP release. The purpose of deep learning courses.com is to have a permanent home for these VIP materials. So even though it could be temporary on the platform you signed up on. If you sign up for the VIP version of the course, then you'll get access to the VIP materials on deep learning courses.com permanently upon request. Here's some examples of materials you might find in the VIP sections in my TensorFlow to course, there are three extra hours of material on Deep Dream and objects localization. Usually I don't release the VIP content in video format, but this was an exception. Another example in my cutting-edge AI course was an extra written section on the T3 algorithm. This course covered three algorithms in total. So the extras section that gives you one more, or in other words, 33% more material. Another example in my advanced NLP and RNNs chorus is a section on speech recognition it using deep learning. In addition, there is an all new section of the course on a stock predictions or memory networks, depending on which version of the course you are taking. The reason for this is I might release slightly different versions of each course on different platforms. Because of how the rules on all these platforms work, I must differentiate the courses. However, since I own a deep-learning courses.com, this is the only platform that contains the most complete version of the course, which includes all the sections. Please note that this is rare, so depending on what course you are taking, it might not affect you. Alright, so let's get back to you. Discount coupons and free material. Other places where I announced discount coupons are Facebook, Twitter, and YouTube. You might want to pause this video so you can go to these URLs and follow me or subscribe to me on these sites if they are websites that you use regularly. So for Facebook, that facebook.com slash lazy programmer dot Emmy for Twitter, that's twitter.com slash lazy underscore scientists for YouTube, youtube.com slash C slash lazy programmer x. Occasionally, I still released completely free material. This is nice if I want to just talk about a singular topic without having to make an entire course for it. E.g. I. Just released a video on stock market predictions and why most other blogs in courses approach this problem completely wrong. That's another benefit of signing up for these things. I get to expose fake data scientists who are really marketers. Whereas I wouldn't ever make an entire course about that. Sometimes this can be in written form and sometimes it can be in video form. If it's in written form, it will either be on lazy program and taught ME or deep learning courses.com. If it's a video, it will be on YouTube. So make sure you subscribe to me on YouTube. If I release a video, I may also make a post about it on lazy programmer dot ME. And I may also announce it using other methods I previously discussed. So that's the newsletter, Facebook, twitter, and obviously YouTube itself. Now I realize that's a lot of stuff and you probably don't use all these platforms. I certainly don't, at least not regularly. So if you want to do the bare minimum, Here's what you should do. First, sign up for my newsletter. Remember you can do that either on lazy program at DOT ME or deep learning courses.com. Second, subscribe to my YouTube channel at youtube.com. Slash C slash lazy programmer x. Thanks for listening and I'll see you in the next lecture.