Transcripts
1. Introduction Video: Hey everyone and welcome
to my latest course, Data Science with
matplotlib in Python. So who am I and why
should you listen to me? Well, my name is the lazy
programmer and I'm the author of over 30 online
courses in data science, machine learning, and
financial analysis. I have two master's degrees in engineering and statistics. My career in this field
spans over 15 years. I've worked at multiple
companies that we now call Big Tech and multiple startups. Using data science,
I've increased revenues by millions of dollars
with the teams I've lead. But most importantly,
I am very passionate about bringing this
pivotal technology to you. So what is this course about? This course is all
about teaching you foundational skills using
the matplotlib library, which has become standard in the past decade for doing
Data Science with Python, you'll learn about how to
make various kinds of plots, like line charts, scatter plots, histograms, and even
how to plot images. These skills are critical
if you want to do data science and visualize
your data and your results. So who should take this course and how should you prepare? This course is designed
for those students who are interested in data
science and machine learning and already have
some experience with numerical computing
libraries such as NumPy. The second skill you'll need is some basic programming
languages, fine, but since this
course uses Python, that would be ideal. Luckily, Python is a very
easy language to learn. So if you already know
another language, you should have no
problem catching up. For both of these topics. A high-school level understanding
should be sufficient. In an undergraduate understanding
would be even better. So in terms of resources, what will you need in
order to take this course? Luckily, not much. You'll need a computer, a web browser, and a
connection to the Internet. If you're watching this video, then you already meet
these conditions. Now, let's talk about
why you should take this course and what you should
expect to get out of it. Well, what I've noticed after teaching machine learning for many years is that there is
a huge gap in knowledge. Students come to a
machine-learning course wanting to learn
machine learning. And they'll understand
the concepts, but then have no idea how
to put those concepts into code because they don't
really know how to code. This course is intended to fill that gap by creating
a bridge between a regular coating
and the type of coding you'll need for data
science and machine learning, specifically plotting and
visualizing your datasets. By the end of this course, you'll have learned enough
to go out and use what you've learned on
a real dataset. In fact, this is what we'll
do as our final project. I hope you're just
as excited as I am to learn about this
amazing library. Thanks for listening,
and I'll see you in the next lecture.
2. Matplotlib Outline: In this lecture, I'm
going to introduce you to the next section
of this course, which is on Matplotlib. Matplotlib as a library we will use to visualize our data. This section will be quite short compared to the num pi section, since in your investigation of machine-learning algorithms, there are really only
a few plots you need. This is not about
building reports or presentations or
anything like that. We're interested in
plots that will help us specifically with the implementation of
machine-learning models. So that being said, what are we going to
cover in this section? First, we're going to
talk about line charts. Line charts, despite its name, are used to plot any kind
of one-dimensional signal. So e.g. you might want to plot a time series like the
stock price per unit time. Another example of that
is a sound wave. So e.g. you might load in and MP3 and look at how the amplitude
changes over time. Musicians look at line
charts like this all day, except when you look at it from the perspective of a musician, the lines are so close
together that you can't see the individual values
at each timestep. Next, we're going
to look at a very important kind of plot, the scatter plot.
The scatter plot. Let's us see our data from
a geometric perspective. In fact, when we think of, say, a classification problem
or a clustering problem, we often think in terms of
geometric picture is just like the even when you have data which is
very high-dimensional, that is data you can see we
still try to find ways to reduce the
dimensionality so that we can see it on
plots like this. Next we're going
to look at another important plot, the histogram. The histogram is
important because it lets us see the distribution
of our data. And of course, by
distribution, probability, distribution, machine
learning algorithms are often defined using the
language of probability. So being able to look at the
distribution of your data to determine what kind
of distribution it has is very important. Finally, we're going to
look at how to plot images. Images are a very
important type of data in modern machine learning. The field of computer vision has exploded thanks
to deep learning. And thanks to these advances, we are closer than ever to self-driving vehicles
and autonomous robots. Part of this also is going to
be answering the question, how exactly is an image
represented inside the computer? As you'll soon see, it's nothing we don't
already know about.
3. Line Chart: In this lecture, we're going to look at how to
make a line chart. First, we're going to start by importing the
necessary libraries. So for this section that would
be import numpy as np and import map plot lib
dot pyplot as PLT. Alright, so next
we're going to create some fake data for the x-axis. By the way, even though
this section and the following sections are
not about NumPy directly, this does not mean you won't be exposed new NumPy
features as we go along. So that's exactly what
we're gonna do right now. So x equals np dot
linspace 021000. So what does that do? That creates a
one-dimensional array with 1,000 evenly
spaced points, 0-20. Next, we're going to generate some fake data for the y-axis. So let's say y equals np dot
sine x plus 0.2 times x. There's nothing special
about this function. It's completely arbitrary. Next we plot our data. Hey, we can accomplish that
by doing plt.plot x, y. Now it's important to note that if you are not using a notebook, whether that's colab or Jupyter notebook on
your local machine, you will not see anything from just this one line of code. Instead, you need to call
an additional function, which is plt.show.
Let's try that. So that would be
plt.plot x, y, plt.show. As you can see, it
doesn't really have any effect if you are
already inside the notebook. Next, we're going to add some
information into our plot. So let's say I want
to label the x-axis. So first we're going
to have our plot. So let's plt.plot x, y. And then in order to
label the x axis, I can do plt.plot x label, and let's just call it input. Similarly, if I want
to label the y-axis, then I can do plt.show y label. And let's call that output. Finally, let's say I
want to add a title, then I can do plt.show
title, my plot. Okay, so let's try this. And we get our title and our
y-label and our x label. Now you notice something weird, which is that notebook
always prints out the last thing that was
returned by a code. So in our case that was this. Since the PLT title
function returns something, we see that in the box below, although we probably
don't want to. So one thing we
can do to suppress this output is to just end
the line with a semicolon. Let's see that. Okay, and now that
output disappears.
4. Scatterplot: In this lecture, we're going
to look at another very important to kind of
visualization, the scatter plot. Of course, in order to have a scatterplot that actually
looks like something, we're going to need
some random data. So let's start by creating
some random data from the standard normal with
shape 100 by two. So that's 100 observations. And a two-dimensions. We do x equals np dot
random dot rand n 102. Now you might be
wondering why 100 by two. This gives us 100 data points
with dimensionality to, we could have chosen
500 data points, but 100 is just fine. But the two is necessary. Remember that computer screens themselves are two-dimensional. And so in order to specify the x coordinates
and the y coordinates, we have to have two-dimensions
no more and no less. There were also
three-dimensional scatter plots you could make. And we do have that
in my other courses, but that would be outside
the scope of this course. So in order to make
a scatter plot, we call plt.show scatter. So that's PLT scatter. I'm going to pass in x
colon zero and x colon one. So x colon zero is
the first column of X and X column one is
the second column of X. And just to be clear,
the first argument here corresponds to
the horizontal axis, and the second argument
corresponds to the vertical axis. So let's try this out. Alright, so we get
our scatter plot. Now in machine learning, often we're interested in
classification or clustering, where we would like
to draw scatter plots of data points with different colors signifying
the different classes. So let's see if we can make
a scatter plot like that. First we're going to
generate some random data, again with dimensionality
to let's say x equals np dot random
dot random 200 to next, I would like half
of this data to be centered at some
different location. Remember that the
random function draws from the standard normal. So by default, all these points are currently centered at zero. So let's say I want
the first half of these data points to be
centered at three-three. To do that, I can just do x
colon 50 plus equals three. So the colon means select all the rows from index
zero to index 50. And a plus equals three means add three to all the elements. Next, we're going to
generate some labels. Let's start by creating an
array of zeros of size 200. So the y equals np.zeros 200. Next we're going to
set the first half of these labels to why
this is so that all the points centered at
three-three have labeled one and all the other points
will have labeled zero. We can accomplish that by
doing why colon 50 equals one. And finally, we can
draw our scatter plot. So this is almost
the same as before, except now we're
going to also use the C argument and
pass it and why? And obviously C
stands for color. So PLT scatter x0x1c equals y. So sometimes it's hard
to see your code because these things pop up
to show you the API. And so the way this works is
the thing you pass in for C should be a
one-dimensional list or a one-dimensional array
containing integers corresponding to how you want
to color the data points. So let's run this. And we get our
colors scatter plot.
5. Histogram: In this lecture, we're
going to look at another essential
plot, the histogram. Since histograms are used for plotting the
distribution of data, we're going to need
some random numbers. So let's start by
creating 10,000 random numbers from
the standard normal. So X equals np dot random,
the random 10,000. Next, let's plot a histogram. So if you've ever programmed in another scientific
language before, you may have even
guessed this yourself. It's just PLT dot hist x. Alright, and that's
our histogram. Now, when we have this
many data points, it's possible to get a better
plot by having more bars. We can accomplish that by
using the bins argument. So we can do plt.plot
x bins equals 50. And this is of course,
one way to confirm that our data truly does come
from the standard normal. We have a bell curve that
is centered around zero. And approximately 95% of the weight is between
plus or -1.96. And obviously I
just made that up. I can't tell by looking at it. Now, just out of curiosity. We may also want to confirm that the random function samples from the uniform distribution. So let's generate some data
from the random function. So that's x equals np dot
random, random 10,000. And let's draw a histogram. So plt.plot x equals 50. Alright, so this is pretty flat, although not exactly,
I'm sure it would look more flat if
you had less bins. But either way,
it's pretty clear that the random function does indeed sampled from the
uniform 01 distribution.
6. Plotting Images: In this lecture, we're going to discuss how to plot images. Now it's useful to mention that many famous machine
learning datasets, such as m-nest and C4
ten or image datasets. But these datasets
are not stored as actual image files, e.g. on your computer, you
may have images that are in the JPEG format
or the PNG format. But some machine
learning datasets are stored in different formats, such that the
entire dataset with multiple images can be
stored in a single file. For us, in this lecture, we will be concerned
with just single images like the ones you might
have on your computer. So to start with, we're going to
download an image from the Internet using
the W get command. Now of course, you don't have to choose the same file as me. You probably don't want to
type this URL out by hand. So as part of your
exercise for this lecture, please find your
own image and get the URL for that image. You can pause this video until you found an image
that you want to use. I am going to grab the URL
from my pre-written notebook. Alright, so you can see from the output of the W get command that we downloaded a file
called the Lena dot PNG. So next we're going to use
a library called Pillow, which will help us
load in our image. So let's import pillow by
doing from PIL import image. Next we're going to use
pillow to load in our image. So for me that I am equals
image dot open Lena dot PNG. Now, although this
is the num pi stack, this return value is
not a NumPy array. We can check the type of
this object to confirm that. So we just do type. I am. And we can see that
it's a PNG image file. Luckily, it's very easy to convert this into a NumPy array. We can just do it like how we convert lists to NumPy arrays. One array equals np
dot array. I am. Now the reason why this works
is because the images are represented in
computers as arrays. If you think about it, an image has two-dimensions,
height and width. For each location along
its height and width, it has a color value. So that's exactly
what this array is. It's a box of numbers. We can print the array
to confirm that. So just do ARR. And that's our image
represented in a computer. Now, there's something
interesting about these numbers, which you'll learn about more if you ever take a class with me on a computer vision
or image processing, all these numbers seem to be integers rather than
floating point, and they are all 0-255. If we scroll down to the bottom, we can see that the D type
of this array is un eight. That is, the numbers are
eight bit unsigned integers. So it should make
complete sense to you that these numbers are 0-255, since two to the
power eight is 256, and therefore with
a bits that is the total number of possible
integers we can represent. Let's check the
shape of our array. So that's ARR does shape. So interestingly, it's a three-dimensional
array of size 512 by 512 by three. So what do these numbers mean? Well, the first two-dimensions, or the spatial dimensions, they are at the height
and width of the image. But why is there a third
dimension of size three? That's because for each
location in the image, we need to store the
color of that pixel. And it just so
happens that colors are stored using three values. Specifically, these
are the red channel, the green channel,
and the blue channel. These three numbers tell us
how much red, how much green, how much blue to combine to make the color
at this location? And by the way,
just so you know, they teach this stuff
in kindergarten. So if you're feeling confused, then tonight you
might have to ask your kids for help
with your homework. So how do we plot this image? Well, we use a function
called IM show. So let's do that. That
would be plt.plot. I am show ARR. As you can see, this is
the famous Lena image used in every computer
vision course and existence. And by the way, this also
works with the original image. So we can do
plt.plot Show. I am. Now one thing we often do in computer vision is we work
with grayscale images, also known as black
and white images. One simple way to convert a color image into a
black and white image is to take the mean across the color channels.
Let's try that. So that's gray equals ARR
dot mean axis equals two. And if we check the
shape of our new array, we can see that it's 512 by 512, which means we've collapsed
the color dimension, which is what we wanted. So what happens if we
plot this image using IM show, do plt.show. I am show Greg. That's interesting.
It seems we've been given a weird set of colors, so weird mix of
green and yellow. Now it's important to
note that these are not actual colors stored
in the image itself. These are all just
numbers 0-255, so it's not like 255 is
green and zero is yellow. These colors are actually
decided by matplotlib. If you were using a different
programming language or even a different version
of Matplotlib or Python. These colors might
come out differently. This is basically what
is called a heat map. So you actually just learn
how to do two things at once. But still we would like
to know how to plot this grayscale image,
actual gray scale. The way we can do
that is by using the CMAP argument.
So let's try that. So that's PLT the IM show
gray CMAP equals Greg. And as expected, our image has
been plotted in grayscale.
7. Matplotlib Exercise: In this lecture, I'm
going to give you an exercise to practice what
you learned in this section. Your exercise in this section
will be to generate and plot what I call the
generalized XOR dataset. So why do I call this
the generalized XOR? Well, if you have a
computer science or engineering background and you have already heard of the XOR. It is a logic gate. It does a logical
operation like the AND, OR and NOT gates. We can write down the XOR
using a truth table where X1 and X2 are the inputs
and y is the output. If x1 and x2 are both zero, then y is zero. If either x1 or x2 is one, but not both, then y is one. If x1 and x2 are both one, then y is again zero. The reason we call this the XOR, which stands for exclusive OR is because it differs from the regular or operation in that the last row would be at
one with the regular. Or. Now of course, if
we plotted this, it would just be four dots, which is not that exciting. It looks a little better, is randomly scattered points, such as what you would see in a machine-learning data-set. So if we split the data
into four quadrants, then the upper left and
bottom right corner, we'll have one color in the upper right and
bottom left corner, we'll have another color. Your job is to generate this data and make a scatter
plot like what you see here. Just to make it a little harder, notice that these quadrants
are defined between minus one and plus one, not 0.1. Good luck, and I'll see
you in the next lecture.
8. Where to get discount coupons and FREE machine learning material: Hey everyone and welcome
back to this class. In this lecture,
I'm going to answer one of the most common
questions I get. Where can I get discount coupons and free deep learning material? Let's start with coupons. I have several ways for you
to keep up to date with me. That absolute number one, best way for you to
keep up-to-date with newly released discount coupons is to subscribe
to my newsletter. There are several
ways you can do this. First, you can visit my
website, lazy programmer dot. At the top of the page, there is a box
where you can enter your email and sign up
for the newsletter. Another website I
own and operate is deep learning courses.com. This website largely contains the same courses as you
see on this platform, but it also contains extra VIP material.
More on that later. So if you scroll to the
bottom of this website, you'll find a box to
enter your e-mail, which will sign you up for the newsletter as you would
on lazy program at DOT ME. So you only have to
do one of these. Now let's do a small
digression because this is another common
question I get. What's this VIP
material all about, and how can I get it? So here's how the
VIP thing works. Usually when I release a course, I'll release it with
temporary VIP material, which is exclusive for those early birds
who signed up for the course during
my announcement. This is a nice little reward for those of you
who stay up-to-date with my announcements and of
course, actually read them. It's important to note that VIP material can
come out at anytime. E.g. I. Couldn't make
major updates to a course three years after starting it
and do another VIP release. The purpose of deep
learning courses.com is to have a permanent home
for these VIP materials. So even though it could be temporary on the platform
you signed up on. If you sign up for the VIP
version of the course, then you'll get access
to the VIP materials on deep learning courses.com
permanently upon request. Here's some examples of
materials you might find in the VIP sections in my
TensorFlow to course, there are three extra
hours of material on Deep Dream and
objects localization. Usually I don't release the
VIP content in video format, but this was an exception. Another example in my
cutting-edge AI course was an extra written section
on the T3 algorithm. This course covered three
algorithms in total. So the extras section
that gives you one more, or in other words,
33% more material. Another example in my
advanced NLP and RNNs chorus is a section on speech recognition it
using deep learning. In addition, there is
an all new section of the course on a stock
predictions or memory networks, depending on which version of
the course you are taking. The reason for this
is I might release slightly different versions of each course on
different platforms. Because of how the rules on
all these platforms work, I must differentiate
the courses. However, since I own a
deep-learning courses.com, this is the only
platform that contains the most complete
version of the course, which includes all the sections. Please note that this is rare, so depending on what
course you are taking, it might not affect you. Alright, so let's
get back to you. Discount coupons
and free material. Other places where I announced discount coupons are Facebook,
Twitter, and YouTube. You might want to pause
this video so you can go to these URLs and follow me or subscribe to me on these sites if they are websites
that you use regularly. So for Facebook,
that facebook.com slash lazy programmer
dot Emmy for Twitter, that's twitter.com slash lazy underscore
scientists for YouTube, youtube.com slash C
slash lazy programmer x. Occasionally, I still released
completely free material. This is nice if I want
to just talk about a singular topic without having to make an
entire course for it. E.g. I. Just released a video on stock
market predictions and why most other blogs in courses approach this problem
completely wrong. That's another benefit of
signing up for these things. I get to expose fake data scientists who are
really marketers. Whereas I wouldn't ever make
an entire course about that. Sometimes this can be in written form and sometimes
it can be in video form. If it's in written form, it will either be on
lazy program and taught ME or deep learning courses.com. If it's a video, it
will be on YouTube. So make sure you subscribe
to me on YouTube. If I release a video, I may also make a post about it on lazy programmer dot ME. And I may also announce it using other methods I
previously discussed. So that's the
newsletter, Facebook, twitter, and obviously
YouTube itself. Now I realize that's a
lot of stuff and you probably don't use
all these platforms. I certainly don't, at
least not regularly. So if you want to do
the bare minimum, Here's what you should do. First, sign up for
my newsletter. Remember you can do that
either on lazy program at DOT ME or deep
learning courses.com. Second, subscribe to my YouTube
channel at youtube.com. Slash C slash lazy programmer x. Thanks for listening and I'll see you in the next lecture.