Image Classification: Zero to Hero w/ Tensorflow | Aditya Shankarnarayan | Skillshare

Image Classification: Zero to Hero w/ Tensorflow

Aditya Shankarnarayan, That Indian Coder Guy

Image Classification: Zero to Hero w/ Tensorflow

Aditya Shankarnarayan, That Indian Coder Guy

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
13 Lessons (29m)
    • 1. Welcome

    • 2. Introduction

    • 3. Neural Network

    • 4. Google colab

    • 5. Understanding an Image

    • 6. Image Data Generator

    • 7. Coding:CatvDog

    • 8. Convolutions and Max polling

    • 9. Coding:CatvDog w/ CNN

    • 10. Fashion Mnist

    • 11. Coding;Fashion MNIST

    • 12. Class project

    • 13. That's it!

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Learn image classification the right way!

Easy lesson with detailed example.

Lesson are explained thoroughly.

Tensorflow will be explained with great examples.

Topics covered in this class will set you up in the right direction.

Basic Python knowledge is required.

Meet Your Teacher

Teacher Profile Image

Aditya Shankarnarayan

That Indian Coder Guy


Hello, I’m Aditya
Iam a Programmer who is passionate about helping students become better coders. 
I have been programming since I was in the 8th grade and have been teaching for over a year now. I primarily focus on Data Science and related topics. 
I hope you find value in my classes and learn a lot from it. 


See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.



1. Welcome: You definitely have a lot of devices lying around. Your smartphone, tablet or a computer. Or smartphone has a virtual assistant. Your computer has predictive text. These kinds of smart devices are all around us. So it's essential to know how these things work. Hi, my name is, and I welcome you to deep learning and AI. In this class, we will be learning how to build our very own network. Middleton invented calculus background. We will be using TensorFlow, which does all that for you. Within your network, you can build cool things like an image classifier, abdomen using Google Lens or write poems with it, or do any other task you throw at it. For taking this lesson, you will be needing a basic Python background. And a basic machine learning background won't hurt. But don't worry, I'll explain most of it as we go. We will be using Python and TensorFlow. Tensorflow being a liability which has lots of useful machine learning and deep learning functions, which we can then use. V will be coding in Google collab of free service, where we can execute machine-learning code while not worrying about computational speed or time. The content of this class will be defining a neural network, building a basic neural network model, building an image classifier for cats and dogs. Then we will be using the famous fashion administrators had to build a clothing item classifier. And we'll do this all in less than a 100 lines of code. I promise you, no time you will be making neat stuff with roulette works. And I'll see you in the next list. 2. Introduction: We are used to creating applications for baking on requirements into composited problems that we can then call. If you're writing a game. Rules, like if the player touches a power of ego stronger? Is it by a fireball, uses a life. So rules and data go in and answers about. In machine learning, we put answers and data in. Then we could do is we give a bunch of examples to our programs and our desired answer. And let the computer figured out there was configured this example I saw in a video. If I'm doing activity recognition and I have different speeds provided to me as data. I can define walking with a particular speed like this. And if I also have to do running, I can do this like this. And I can do biking as well. But then I have to do code for any other sport. Then just with speed data, I can't do it. So my program becomes broken. So what we can do is we can express this problem by providing lots and lots of data and then labeling them by saying, hey, this data is walking, this is running, and this is Viking, and this is golfing. So by providing this data to our program, we let the machine infer what are the rules? Let's take an example. I like you to solve this equation. What do you think is the relationship between x and y? Take a minute. I hope you got this answer. The relationship is y is equal to two times x. Congratulations, into basics of machine learning in your head, a machine-learning algorithm figures out the patterns in the data and provides distinctiveness in each. V will be learning how to build a basic unit work first. And then by the end of this class, you will be building a computer vision model which can identify different things. So now we'll go on and build our first one network and see how everything works. 3. Neural Network: Before we start, I would like to familiarize you with certain aspects of a neural network. A neural network is made up of layers, input layer, hidden layer, and output layer. The input layer is where input is given to the network. If you're making an image classifier, we will input the pixel values of the image is the hidden layer is where all the heavy lifting happens. Each hidden layer has the fate and bias associated with it. The value of weight and bias helps to neural networks to learn different things. How does this learning takes place is out of the scope of this class, and it requires a mathematical and calculus background. Luckily, all this math is implemented for us in the form of TensorFlow functions, making it extremely easy. People having the background will certainly help you build a better neural network. But it is not required for beginners to get a basic understanding of neural network. I recommend you to go and check out playgrounds dot This is an excellent tool to understand how things work in the network. This is a classification problem where the background wildtype to map the respective colored dots. Try out different combination and test how it affects. Certainly, the learning rate is how fast or slow network will learn to have a value very high and will cause problems reaching a reasonable solution and having a value too low will increase the learning experience. Activation, activation is applied to each layer in the network, except the input layer. The activation basically shifts the value range of the particular neuron. For example, if features blue shifts the value of the negative numbers to 0 and will keep the positive numbers. And if you change it to sigmoid, sigmoid will transfer all the values will range between 01. So it can transfer values between 0 and when we use them in binary classification. Regularization helps us overcome overfitting and underfitting, which we will cover in a later lesson. So I recommend you guys to check out how different classifications work and test out different combinations. 4. Google colab: Google collaborative free cloud service, where you can improve your Python programming skills and develop deep learning applications using popular libraries as extensor flow and carries, unlike coding on your local machine and downloading bunch of libraries, all these libraries will be pre-installed. So saving up lots of space and local machine. As machine learning or deep learning is a computationally heavy task, we can use the PGP provided greener modal faster. The only requirement is an internet connection and then Google account. So when you open up your lab, you will be greeted with a welcome page. You can read it and learn more about it. Still, maybe skip it, and then click on notebook. This notebook will be saved on your Google Drive. You can save it in a local machine by going here and saving it as a Python notebook. As a Python file. We change it to dot mode and go to settings. And under settings, you can find teams that you continue to Doug Moore and then Himself taught to execute code. Negro to write that code, then press Shift Enter or hit the play button, and this will execute R code, and then the output will be displayed below the segue. I want to share more information about Kuulab, which I will do in a later lesson. So this was a basic introduction to Google Codelab, which will be sufficient for our use case to get in-depth knowledge about critical utensil for YouTube page. To find more information. 5. Understanding an Image: Before we make of Western man's classifier, we need to understand our data. It is a very important step while building any different type of models in machine learning or deep learning. I have provided the link for dataset. So you can download it for yourself and see the images. The dataset has two subfolders, training and test data. And each of them there is a folder with cat and dog images. These are colored images. A colored image has three Jan and status RG and B, that is red, blue, and green channels. And each pixel in these channels has a value range between 0, could do 55, 0 being dark. And to justify being a bright pixel. When we converted images to an array, we get a diamond spent a day with each element being a pixel value. This is our input to our neural network model. As these value ranges between 0 to 255, it becomes very difficult for a model to learn as the value range is extremely high. So we normalize this and convert this into a range between 0 to one. We can do this by dividing every single element by 255. So the range will be between 01. This is an overview of our data. We will learn more about how we can feed this information to a neural network. So it learns to identify between cats and dogs. 6. Image Data Generator : When you learn to build your own models, most of the dataset you will use will not be divided or labeled for. You, will have to do it for yourself. So in this lesson, we'll take a look at some APIs that are available to make that easier for you. In particular, the image data generator in TensorFlow. One feature of the image data generator is that you can point it at a folder and then the subfolder of that will automatically generate labels for you. The emission Rater class is available in caries assessing dot image. You can then call an image generator like this. I'm going to pass rescale to it to normalize the data. As we discussed in the previous lesson. You can then call the flow function from the directory method on it to get it to load images from the directory and its sub directories. It's a common mistake that people point the generator to the sub folder, it will fail in that circumstance, you should always pointed to the directory that contains the subfolders. The names of sub folders will be the labels for images that are contained within them. So make sure that the directories are pointing to the correct one. Now, images might come in all shapes and sizes. The input data oil has to be the same size, so the image will need to be resized to make them consistent. The nice thing about this is we can call the images to be resized without actually changing them. And these resizing is then as they're loaded. So you don't need to pre-process thousands of images on your file system. The advantage of doing this at runtime like this is that you can then experiment with different sizes without impacting your source data. The images will be lowered for you in a training and validation. These data will be divided into batches. It is a more efficient way of doing things rather than going through the images one by one. None, there's a whole science to calculating batch sizes. That's beyond the scope of this class, but you can experiment with different batch sizes to see the impact on the performance by changing this parameter. Finally, there is the class more. Now this is a binary classifier. That is, it takes between two different things, cats and dogs. So we specify that your other option in particular, for more than two things, will be explored later. The violation and vector should be exactly the same. Except of course it will defend directory login contain the sub-directories containing the test images or the validation images. This is what we will be using to feed our images to a network. 7. Coding:CatvDog: In this lesson, we will be building an image classifier that can distinguish between a cat and a dog. This won't be a file modal. We will build a better model. In the later lessons, I'll go through the code and explain it. The first few blocks I just importing the libraries, the Google Drive libraries are required for getting our dataset. Use the dataset, upload a zip file to Google Drive, and after uploading, Use the same icon to start a new lab notebook. After starting their own book, go to runtime and then changed and time settings where you can find an option for hardware acceleration. Under that option, silane GPU and invest save using a GPU washed linkages are computational speed. The third block will ask you authenticate yourself and will provide you with a link, test that link and Chooser account in the dataset. After accepting the permissions, you will be given a key, copy and paste that key in the box below. This will successfully mounted drive. I have seen many different ways of doing this, but I have found this method to work the best. If you have any question, please feel free to share it with the class. The next block is forgetting our dataset from our drive. Our file name is Dataset, so we write dataset here. If you are using any other file, use its name. Next we unzip the file to the folder and subfolders. The next block is the image data generator function, which I talked about in the previous class. As you can see, we have a 1000 images in our training data. And 2 thousand inner violation did not. Next week you eat or model. The flattened function reshapes our 3D array into vector that is only when d. So we can use it as an input. In the last programme, we did not use this flattened confront. Why do you think that is? The next line is creating our hidden layer as it is a more complex problem, we will be needing more neurons and more layers. I had just used two layers you are, I recommend you to try it with more layers and more neurons and observe the results. The compiler model this time. And we use Adam optimizer, which is a power of an optimizer. I'd recommend you to go and click on TensorFlow documentation for details about it. When we fit our model, we add one more parameter that despair edition data. This will fit the current model with rM0 ideation data with respect to the epochs. So when we execute our model starts to fit. This takes quite some time to learn because the number of images, but we are using the GPU, so the speed is increased. As you can see, we have a loo laws and an accuracy of about 60%. But as you can see, we do not perform as hard in a test data. It is because of over-fitting. There are many ways to reduce it, like regularization or reducing the number of neurons, which I already discussed in the previous lesson. So I recommend you to go and check out will defend, wish to get rid of overfit models are overall model is not that good as machine learning and deep learning is an iterative process, takes time to improve the model. But don't worry, because we'll be building a better model in the coming lessons. You do not have to go through that process. 8. Convolutions and Max polling: If you went to the images in the dataset, you would have noticed many images had vested negative spaces or spaces where the feature where in present there is a v. We can condense these images down to the essential features that distinguish an image from a cat or a dog. We use convolutions to make that happen. It usually involves having a filter and passing that filter all the images to change the underlying element. The process works a little bit like this. For each pixel, take its value and take a look at the value of extinguish. Our filter is a three by three filter. Then we can take a look at the immediate neighbour so that you can have a corresponding three by three grid. Then to get the new value of the pixel, we can multiply each neighbor's Medea corresponding value in its filter. So for example, in this case, our pixel has a value of 192 and the upper left neighbor has the value of 0. The upper-left value of the filter is negative one. So we can multiply by negative one. Then we could do the same for the upper neighbor, its value 64, and the corresponding filter value 0. So we multiply those out, repeat this for each neighbor and each corresponding filter value. And then we have the new pixel with the sum of each of the neighbor values multiplied by the corresponding filter value. That's a convolution. The idea here is that some convolutions will change the image so that the certain features in the image are emphasized. So for example, if we look at this filter than the vertical line in this images will pop up. When this filter, the horizontal lines pop OK. Now that an elementary introduction to what convolutions due and when combined with cooling, they become really powerful. Pooling is a way to compress an image. A quick and easy way to do this is just to go over the means for pixels at a time of these four pixels. Big, the biggest value and keep judge that. So for example, you can see it here. Minus 16 pixels on the left are turned into four pixels on the right. By looking at them, convolutions are highlighted while simultaneously quartering the size of the image. If you have trouble understanding them, don't worry, we will see them in action in the next lesson and we will get a better understanding. 9. Coding:CatvDog w/ CNN: Now we're going to apply the things we learned in the previous lesson about convolution and pooling the dog versus cat classifier. Most of the code stays the same. The module definition changes a bit and we add convolution layer and max pooling layers were more than the coin 2D layer defines a convolutional layer. The 64 is a number of filters we are going to use. And three by three is the shape of the filter. The max pool layer will be the next line of code and we'll have the image. I apply these few more times to condense the images. The next block will give us a summary of our model. The first con layer takes our images and reduces it by four pixel. This happens because of the convolutional operation I mentioned in the previous lesson. The reduction in size depends upon the filter size. To prevent it, views padding. Padding will add an extra layer of pixels with the value 0, the perimeter of your image. So when we perform convolutions, we retain those pixels and we do not lose the four pixels. Like in this case, I'm not applying padding. You're, if you wish to add it, use this syntax and pass it with the respective con layer. Now we compiled and visualize how mirrors work. V take three images and pass it through our conv layer. As you can see, the image reduces in size. From 150, it went down to 34, while maintaining the general shape of a dog or a cat. The bright outlined in the form of a cat. And this one is a dog. Now, we will execute our model. This will take a few minutes. Now that are more or less finish learning, we can see our training and validation accuracy is quite high. This is way better than our p is more than. And we did it just by adding a few lines of code. Still, deep learning and machine learning are irritative processes. And to get a really high accuracy on unseen data, they're tweaking the model by adding neurons, removing layers, et cetera, goes along with, I would like you to add padding and see how it affects the results. In the next lesson, we will classify a multiple loading items using convolutions. 10. Fashion Mnist: In this lesson, we will build a model that can perform multiple classification. We're going to classify different clothing items. We will be using the MNIST fashion dataset for our tasks. So MNIST as a database and stands for Modified National Institute of Standards and Technology. You would have heard about it because of the popular handbook and detach. So TensorFlow has an inbuilt function to import this dataset, which makes it easy for us to use. This data set is also pretty labeled and divide it into train and test. So we don't have to apply the image generator function. And there are a total of 70 thousand images of size 28 by 28 and only have one channel. These are the state has ten different clothing items, so we need a different activation function and output layer. Previously we were using the sigmoid function, which is great for binary classification. It's value ranges from 0 to one. But now we need to calculate the value of all the different classes. And seek mode just won't cut it. So we use softmax. It is an activation function that gives probabilities. All the classes and a value of one unit will be gated than all the other units, and that will be our predicted class. So for example, let's take a new model which has already learned how to classify between a shoe bag and a cap. So we feed this network with an image of a shoe. So after going through the model, one unit, which represents that the images issue will have the highest probability compared to the three unit. So normally we use this to build a new model to classify different fashion items. 11. Coding;Fashion MNIST: Multi-class classification doesn't differ much from binary classification of few minor changes urine there. And we will have a new classifier in no time. As I mentioned in the previous class, we have an inbuilt function for retrieving the data set. This dataset, helicopter love ten different classes or clothing, a paddle, 0 being a t-shirt, one being pants, and so on. This is a plot of a sneaker from the dataset. This dataset is also labeled and divided. So the only pay processing we need to do is to add the channel, that is the one channel it has and normalize the data from 0 to 255 into 0 to one. After doing that, we build our model as these images are only 28 by 28. We do not need to add many convolutional layers are Maxwell layers. The model syntax remains the same and the only thing changing is in the last layer may be changed the number of units from one to ten because we have ten classes. And then changing the activation, the softmax at the summary tells us the image is 13 by 13 inside. And the 64 defines the number of filters we used in the con layer may also change the loss function boost pars categorically cross entropy to compute loss for all the classes, we used binary cross entropy in the previous lesson, which won't work because we are dealing with multiple classes here. If you want to know more about the loss function and optimizers, I recommend reading the tensor for documentation, this model can be improved slightly and I leave it up to you for deciding how it can we then, I'll hint at in moles adding a layer to the model v got a really good accuracy of about 0.9 on the training set. And now let's check its performance on the test set as well. This line of code will evaluate the testimonies with the loan model. It can be visualized as the forward pass where the model makes the gas and does not learn. We also got a great accuracy of 0.9 in deep learning, it is often very difficult to approach test set accuracy and can be very time consuming. So far, model and accuracy of 0.9 is quite good. If you want to improve upon it. Go ahead. I have already given you a hint. In the next block, we will test some photos. I downloaded a few images of the internet to test the first image. In just an image from a dataset. It is a pullover t-shirt and it should classify it correctly. The second image is a model veiling of pullover. Should we just not be, don't put any dataset. It identifies it as well. But as soon as I gave the 30 minutes, which is a little off centered and has lots of roads and goals and lot of noise in general, it fails. So for example, if you're building an app for classification of these kinds of photos, you can expect these photos being used for the model. So when you printer model, these are the kind of photos you need to use. Using centered and clear images will not apply in the real world. Like in the MNIST dataset. The users will click off centered blurry pictures and their model will have a hard time, would you take them? So always have a dataset similar to that application. So I suggest you implement this code by yourself. And in the next lesson, we will discuss the class project. 12. Class project: For this class project, I would like you to build a multi-class classifier using the MNIST handwritten dataset. It can be imported using the MNIST function like this. You can also use any dataset you want. You can find plenty of amazing datasets on Kaggle. Kaggle has a database of thousands of datasets ranging from CSV files to images. If you have was Silicon Valley on edge view, you can build Jin Yang seafood application, which basically classifies between a hotdog or not a hot dog. You can go the classic way of defining hotdog or not a hotdog using binary classification. Or you can build a useful lap of identifying different foods, both of the datasets that are available on Kaggle and I've linked to them in this description. Whichever option you choose, download that dataset from Kevin imported to your Google collab, like shown in the previous lessons, create your modal and share the code of your model with the rest of the class in the form of images or linker Google collab to your project. 13. That's it!: Congratulations on finishing this class. I hope you had fun learning and doing all the programs. A quick summary. In this class, we learn basic components of a neural network. Building a basic neural network using TensorFlow, cat versus dog classifier, CNN and max pooling, cat versus dog classifier using CNN max pooling, multiclass classification. And finally, we use the MNIST dataset. I hope this helps you to get a hang of deep learning and can use these concepts you learn in your applications. I hope you enjoyed taking my class and I request you to share this class with an interest rate fence. And finally, I'd like to thank you for taking my class and I hope to see you in my next classes.