Transcripts
1. Introduction Video: Everyone and welcome to my
latest Course, Intro to Colab. So who am I and why
should you listen to me? Well, my name is the lazy
programmer and I'm the author of over 30 online
courses in data science, machine learning, and
financial analysis. I have two master's degrees in engineering and statistics. My career in this field
spans over 15 years. I've worked at multiple
companies that we now call Big Tech and multiple startups. Using data science, I've
increased revenues by millions of dollars with
the teams I've led. But most importantly,
I am very passionate about bringing this
pivotal technology to you. So what is this course about? This course is a very
simple chorus designed to help you get started
with Google Colab. Now, you might be wondering, what is Google Colab? Google Colab is a very
powerful computing platform which allows you to run at Jupyter Notebooks in the Cloud. This means you don't
need to purchase expensive hardware to do machine learning
and data science. You can simply use Google's
hardware for free festival. You also get free access
to GPUs and TPUs, which are essential for training modern deep neural networks. So who should take this course and how should you prepare? This course is designed
for those students who are interested in data science
and machine learning. But I've never heard of colab or need some help
getting it setup. You don't even need
to know how to code to take this course, although that might be
helpful if you want to understand why we are
doing what we do. So in terms of resources, what will you need in
order to take this course? Luckily, not much. You'll need a computer, a web browser, and the
connection to the Internet. And if you're
watching this video, then you already meet
these conditions. Now, let's talk about
why you should take this course and what you should
expect to get out of it. Simply put, Google Colab is a powerful and convenient tool and it's truly amazing that
we get to use it for free. By using Google Colab, you will be able
to offload all of your important data science
and machine learning scripts to the Cloud and easily share them
with your colleagues. By the end of this course, you will have set up Google
Colab on your Google account. And you'll be able to run
your own notebooks and make use of Google's
free GPUs and TPUs. So I hope you're just
as excited as I am to learn about this
amazing platform. Thanks for listening, and I'll see you in the next lecture.
2. How To Set Up Google Colab: In this lecture, we're
going to go over a very new and
exciting environment for writing deep
learning code in Python, which is Google's Colab,
short for Collaboratory. For those of you who like
to use Jupiter notebook, this is an even better option. This is basically the same as Jupiter notebook with
the following bonuses. First, it's hosted by Google, which means you don't have to use your own computing power. You'll notice that when you
need to download data files, it happens extremely
quickly because, well, Google's network
is extremely fast. Second, you get access to a GPU and even Google's new TPU, which is pretty amazing. A TPU is not something you can buy for your personal computer. So it's pretty nice to be
able to make use of one. Remember that the way
TensorFlow code is written, you don't have to worry about what kind of device
you're using. Well, for the most part,
generally speaking, the same code will work
whether you're using a CPU, GPU, or TPU. Third, the colab notebooks are stored in your Google Drive, so it's in the Cloud. You'll never lose it and it's very easy to share
with other people. Fourth is that many of the libraries you need
for deep learning, machine learning and data
science are already included. In fact, I was surprised that there were many more
than I assume there would be there or even competing deep learning
libraries already included, such as the piano and Py Torch. So for those of you who hate
doing environment setup, myself included, this is
really truly awesome. So in this lecture,
we're not going to do anything really
technically complicated. Rather, we're just
going to talk about Google Colab and do some short
demos so you know how it works and can see for yourself that it's just like writing
Python anywhere else. To start, I'm going to
assume you already know how to create a
Google Drive account. If you don't have one, go to drive.google.com
and sign up. Once you have your Google Drive account and you've logged in, you'll see this interface. From here. You can
hit the new menu, which allows you to create
all different kinds of files, such as Google Docs, a spreadsheet, a
presentation, and so forth. So let's do this. So now what you
want to do is go to the More menu and
hit Collaboratory. Alright, so as you can see, this brings up a new notebook. And from here, you can mostly use this as you would
a normal notebook. Now, one thing that might
happen to you is that you might not see Collaboratory
in the menu at all. So as you can see, I've hit the New menu
and I've hit more, but I don't see
Colab in this case. Here's what you can do. You want to select, Connect more apps. From here, just
search for Colab. And the first thing that will
pop up is Google's Colab. Add this, and Google
Colab will become available from the menu
we just looked at. So if we go there again, we can see that Colab and
now appears where it should. So let's go in and
rename this notebook to tf dot Dato intro. So first we're gonna get
right to the good stuff. How can we make use
of a GPU or TPU? In order to do this,
you'll want to go to the runtime menu and select
Change runtime type. As you can see, there are
two select boxes here. The first one lets you select which Python version
you want to use. So we'll be using Python
three for this course. The second and lets you select what kind of device
you want to use. So that's either none, which is the default, or GPU or TPU. Now note that
sometimes the GPU or TPU might not be available. This is because these
are shared resources. So your fellow peers
taking this course and other machine-learning
students and researchers all around the world might
be using Google Colab. And we're all sharing
these resources. So if our usage of these resources hits the
limit of what's available, then you might not have a GPU or TPU available when
you need them. For this reason, some of
the code you'll see in this course may be done on
my local machine as well. But remember, Python code
works the same anywhere, so it does not
make a difference. Next, you can see that there are two main types of cells that we can create in the notebook. Code and text. You can click on either of these to create a new
cell of that type. Let's click on Text, since that's a little easier. It's not really something we
are going to use very often, so let's just get
it out of the way. So I'm actually going to
delete the very first cell. Alright, so as you can
see, when I click this, it creates a new cell with what looks like
a rich text editor. You'll notice that it's
split into two halves. The left half is where
you enter your text and the right half is a preview of what it will look like. So let's enter some text. This is my title. Now you can click the little t, big T icon, which changes
it into header text. So you can see that it
makes this a little bigger and bolder appropriate
for a title. Next, let's enter
some regular text. This is regular text. Note that there are also
these arrow brackets. So it looks like it's going to let us enter code snippets. So let's try that. And so as you can see, it makes the text
a monospace font, which is appropriate for code. Now there are some
other options here. So you can make a link, you can add images, you can indent, you can add a numbered or bulleted
list and so forth. So if you're interested, play around with this. Otherwise we're not going
to mention it again. Next we have the code cell, so let's create one of those. Alright, and as mentioned, we're not going to write any
fancy code in this lecture. We just wanted to
do something simple to make sure everything
works as expected. So let's start by importing
numpy and matplotlib. Alright, beautiful. As I mentioned earlier, these already come
pre-installed. Next, let's create a new code
cell and make a sine wave. So first we need to
create some x values. So let's make x go 0-10 pi
with 1,000 points in-between. Next, let's make
y the sine of x. Next, let's create a new cell and plot what we just created. So that's just plt.plot x y. Now, since this is a notebook, there's no need to call plt.show since the plot will just
appear in the notebook itself. Alright, Very cool. Works just like a
regular notebook. At this point, we've convinced ourselves that Google Colab, you can do the
usual things you'd expect from a Jupyter Notebook. Now, as I mentioned earlier, one thing that's very nice
about Colab is that it already comes with a bunch of useful
libraries pre-installed. In my opinion, this makes Google Colab way better
than Jupiter notebook. And if anyone ever asked me to write in a
notebook environment, I would choose Colab by default. I'm not a big fan of notebook, but I am a big fan of Colab. So here we can see
that I've written some code to try and import
a bunch of libraries. Specifically, these
libraries are libraries that have been
used in my courses, some more than others. Some are pretty rarely used. So you might not expect
that they would be included libraries
like Word Cloud, which we've only
used once so far. And yet, if we look, we see that everything
I've tried to import here does
not throw an error. So this tells us
that these libraries are indeed available. What's interesting to
me is that some of these libraries are not
machine-learning related at all. Of course, we've used them
in my courses because they are generally useful
as Python libraries. But it's nice to see the
folks at Google also make use of these same libraries and so thought to include them. So here you can see the usual
stuff such as scikit-learn, numpy, scipy,
matplotlib in Pandas. We also have torch and the nano, which is surprising
because they are competing deep learning libraries and development for the ghetto
has been stopped for a while. Now. We also have seaborne word
cloud, Beautiful Soup, which is for XML and
HTML parsing request, which is for making HTTP calls. Network X, which is for
graph functionality, CB2, which is for OpenCV. And Jim, which is OpenAI Gym. All in all very impressive and
much more than I expected. So there's some final caveats to colab that I
wanted to mention it. First, the main
thing you have to remember is that
this is the Cloud, so these are shared resources. So one way this
affects you is if you leave your notebook
alone for a long time, it will become inactive
and disconnect. Any computation that
you may have run earlier won't be saved. So e.g. if you define a
variable a equals five, and then you come back
later after your notebook was disconnected and
you try to print a, it'll say a is not defined. So you see this notebook
has disconnected. So let's say I do
reconnect and I print a. It's going to say
a is not defined. Another way this affects you is that you might run
out of memory. So if that happens, you might want to try running the code on your local
machine instead. And as mentioned earlier, the GPU and TPU might
be unavailable. So either you can run your
code without the GPU or TPU, or you can run the same
code locally as always, options you had previously
are still available. E.g. you can provision
a GPU instance on AWS, which if you choose the
correct AMI or amazon Machine Instance will come with the usual libraries
pre-installed also.
3. Install Tensorflow 2.0 on Colab (Optional): Now, there's a reason
I didn't mention TensorFlow specifically
in the previous lecture. Which is because that's
what we're going to talk about in this lecture. So this lecture is going
to be about how to use TensorFlow to 0.0 in Colab. You'll notice that if
you import TensorFlow in Colab and you check the
version, it'll say 1.14. So let's do that. Now. Obviously this depends
on when you try to do this. Currently at the time I
am making this course, TensorFlow to 0.0
is still in beta, which means it hasn't
officially been released yet. So if you try to use
the usual command, pip install TensorFlow, you will not get
TensorFlow to 0.0. Of course, this will
change in the future when TensorFlow to 0.0
is officially released, at which point the usual
command pip install TensorFlow will actually give you
TensorFlow to 0.0. And of course, as subsequent
versions are released, that will change to 2.1
to 0.2 and so forth, or whatever version
numbers they end up using. Luckily, you can install other libraries in
a Colab notebook, which did not come with
the notebook. So e.g. if Colab didn't come with
scikit-learn installed, then you would just run
the command pip install scikit-learn inside a code cell within the Colab notebook. In other words, in order
to install libraries, It's as simple as running
the usual PIP commands. You just have to
put the bang symbol first, More on that later. For now, we are interested
in TensorFlow to 0.0. At the time I made this video, the current version of
TensorFlow to 0.0 is beta-1. The current command would be Bang pip install minus
q TensorFlow equals, equals 2.0, 0.0 dash Beta-1. Note that the minus q
option here means quiet, which just means
print out less stuff. It doesn't actually modify the functionality
of the command. Importantly, here
you have to keep in mind that one of
my famous rules, learn the principles,
not the syntax. This is very important here. Why do I say this? Well, inevitably, some lost
soul will end up saying, Why should I use this command when TensorFlow
beta three is out? Doesn't this mean that the
lecture is out of date? Shouldn't you update
this lecture? And remember the rule, learn the principles,
not the syntax. Of course today, the
latest version is Beta-1. Tomorrow that might
be Beta-2 and Beta-3 or Beta five-hundred. Who knows? The
principle is to look at TensorFlow's website to check what the current command is. That's the principle. Don't try to memorize the
install command verbatim, which would be very silly. Okay, so be smart. Don't be silly.
Learn the principles and don't memorize the syntax. Also, note that you can install the GPU version of TensorFlow, which is as usual, pip install TensorFlow GPU. Interestingly, on
Colab, I found that using the GPU is not that much
faster than using the CPU. So for most small problems, it shouldn't matter that
much what you use for TPUs will be discussing how that works later
in the course. So let's run this. So after installing
TensorFlow to 0.0, you can check the version again. Just print out tf
dot underscore, underscore version,
underscore, underscore. And you should see 2.0, 0.0 or something similar. So let's run this. Now there is one caveat to this, which is that I found that there's sometimes
doesn't work. So even after installing
TensorFlow to 0.0, I print out the version
and it still says 1.14. It seems that the
problem is if you import TensorFlow and then try to change the version,
it won't work. So if you accidentally do this and you actually
want TensorFlow to 0.0, then what you'll want to do is first make sure you are not trying to import TensorFlow
before installing TensorFlow. So let's comment this out. And then let's go
to the runtime menu and select Restart runtime. So yes, so we're no
longer running this. We're just going to run this. And now we're going to run this. And it works. So now we have 2.0, 0.0 Beta-1. Now, in general, I find
that this is a bit wonky. So if I run this
notebook and then I tried to change the
TensorFlow version later. So say I try to switch from
CPU to GPU or the reverse, things tend to get
a little weird. So what I'd like to do is have everything set
from the beginning, know what you want to use, and then run it like that from
the start and don't try to change things in between because sometimes the thing
you were using before, it's sort of like sticky. So even if you try to change it, it won't actually change. Now, there is another
important caveat to this, which is that if you
recall previously, I said if you leave
your notebook idle for too long,
it'll disconnect. If this happens, unfortunately, your TensorFlow version
will revert back to the default and you'll need to install TensorFlow to 0.0 again. Now personally, I don't mind running all the
cells each time. Since if I really wanted to
run everything in one go, I would just run it locally. But if for some reason you
would like to have TensorFlow to 0.0 Beta-1 permanently
installed in your Colab. You could try the solution provided in this
link I've attached. Again, that's up to you. But personally, I didn't have
a reason to do it myself. So you'll recall that we
discussed this bank command, which by the way, also exists in regular
Jupyter Notebook. So far you know that
it can be used to run pip install commands. But in general,
you can treat this like a directive that
tells the notebook that you want to
run this command like you would in
the terminal, e.g. if I want to list all the files
in the current directory, I could use the command bang ls. So let's try that. Interestingly, you'll
see that there's this folder that appears
called sample data. So we can call manual
as sample data. Here you can see we have
the famous m-nest dataset, the California Housing dataset, and some random JSON file. We may or may not use these, but these are good
if you want to just run some simple tests like say, try a simple image
classifier on m-nest. In any case, there you have it. That's how you use
TensorFlow to 0.0 in Colab. In the case that it's not yet
been officially released.
4. Uploading Data to Colab: In this lecture,
we're going to do a few more tasks in Colab. Specifically, we're
going to look at some ways to upload your
own dataset to colab. Let's say e.g. your client
or employer gives you a CSV file or you download
a CSV from Kaggle. How can we then make this file accessible from our
Colab notebook? In this lecture, we're
going to discuss a few different
ways of doing this. The first method we're
going to look at is just to use the classic
Linux command W get. As mentioned previously,
you can run command line commands preceding the command with the bang symbol
or exclamation mark. So let's go ahead and download
the arrhythmia dataset. Now, we want to check
where the data when. So let's use bang ls to see if the data is in
our current directory. Okay, it looks like it is. Now let's use the
head command to see the first few lines
of the data file. And also to check whether or not the file has a header row. Okay, So it looks like it
does not have a header row. Next, let's try to load
in the data using pandas. We're going to pass it in
the header equals none, since we know that the data
does not have a header. Next, since the data
has many columns, we're just going to
take the first view. We're also going to
rename the columns because they are currently
just integer values. As usual, since
this data is from the UCI Machine
Learning Repository, you can just check
the documentation if you want to know
more about the data, like what each column is. So let's run this. Next. Let's create a histogram
of these data columns. Since notebook by default
makes the plot pretty small, we're going to import matplotlib and change
the figure size. Once we've done that, we can call df.head just to create histograms
for each column. Note that I've added a
semicolon to the end of df.head just because
if you don't, then a notebook will print
out the last returned value, like it usually does, which we don't want right now. So here's some nice histograms
for you to look at. Next, let's create
a common plot for data analysis, the
scatter matrix. This does a scatter plot between each feature and
every other feature. Along the diagonal.
It just plots the histogram of each feature, which we've already seen. Alright, so pretty
standard so far. Next, let's look at the second method of
loading and data, which also applies
when you have a URL. This is to use
TensorFlow directly, specifically the Keras
get file function. Let's start by assigning the URL to a
variable called URL. We're going to be using
the auto MPG dataset. Although it doesn't
really matter what you use for this example, as long as you can access
it directly via URL. Let's run this. Next. We're going to make sure we have TensorFlow to 0.0 installed. So we're going to run pip
install TensorFlow and then prints out the
version to make sure that we have the correct one. Next, we're going to call
the Keras get file function. The first argument
is the file path. We want to save two, and the second argument
is the file source. Let's run this. Note that it's possible to save the file to a
different directory, but we'll be saving it to
Keras is default folder. So you can see from
the printout that the file ends up in slash, route, slash dot
keras slash datasets. Next, let's call the head
command so that we can see the first few
lines of a file. As you can see, it's
not exactly a CSV. Instead, each column
is separated by whitespace and there
is no header row. So in order to load this data, we can still use the
pandas read csv function, but we have to pass
in two arguments. The first argument is
to say that there is no header row,
header equals none. And the second extra
argument is to tell pandas that the
delimiter is whitespace. So we set the limb
whitespace equal to true. Next we call df.head just to make sure everything
works as expected. So as you can see, the result appears to be in
the right format. And from here you can
process this data using Python code as
you normally would. The third method
we're going to look at in order to add your own faster Colab is to upload
the file directly. In order to do this,
we have to run a special Colab function. So we say from google
dot Colab import files, then we call files dot upload. So let's run this. So you see that this creates an upload button which we can click and then choose a file
from the local file system. So I'm going to choose
daily minimum temperatures. And if we print out
the returned value, you can see that it's a
dictionary where the file name as the key and the value
is the file contents. If we use the
command and bang ls, we can see that
the file has been uploaded to the
working directory. Next, let's read
in the file using pandas to make sure we
get what we expect. Now this file has some
garbage lines near the end. So I've accounted
for that by setting the argument error bad
lines equal to false. This ignores errors but prints them out as
they are encountered. As you can see, the file
is loaded in successfully. To follow up this example, we're going to look at a
variation on what we just did. You recall that when you're
writing code in Python, sometimes it's useful to split your code amongst several files. This helps to organize your code and keep similar things all in one place while keeping
different things separate. As a simple example, sometimes we'll learn about multiple algorithms
in one course, but we'll test all those
algorithms on the same dataset. There's no point in
rewriting the code to load in the dataset and
multiple different times. Instead, we can write
the data loading code once and then import
it from each file. Now, you might wonder is
since we're working in Colab, how can you import a function
from a Python script? If that Python script is
on your local hard drive. Luckily, we can take the same
approach we already have been to upload that
file to Google Colab. So here I'm going to call
files to upload again. And this time I'm uploading
the Python script, fake util dot py. So fake util dot py contains only one function called
My useful function. And all it does is
print out Hello World. Once you've uploaded the file, you can see that
we can import it just like we would if we
were working locally. So I can say from fake util
import my useful function. Then when I call my
useful function, you can see that hello world is printed out just like we expect. And by the way, you
might be wondering, as I did, what the path of the current
directory actually is. To determine this,
you can just run the usual Linux command, PWD, and that prints
out slash content. So Slash Content is our
current working directory. The last thing I
wanted to cover is something you're
probably all wondering. Google Drive is
for storing files. So is it possible to access
files on your Google Drive? And of course the answer is yes. So in order to do this, we have to import drive
from Google Colab. Then we have to
mount the drive by calling Dr. Don mount and specifying the path slash
content slash G Drive. So this is going to give
you an authorization code. So you go to the URL
in your browser. It asks you to sign in as
you to accept some terms. And then it gives you a code C, you copy this code and you
put it back into this box. You hit enter. Okay, So that works. So after we've done this, we can call ls again to check what's now in
the current directory. We can see that there's now
an extra thing here, G Drive. So let's ls, g Drive and
see what that gives us. Alright, so it looks
like we now have a thing called Google Drive. Once again, LS this, remember that you
have to add quotes if your path contains whitespace. And now we can see a bunch of files that are in
my Google Drive, which is essentially a bunch of VIP content for the VIP
versions of my courses.