Train and deploy deep learning models | Nour Islam Mokhtari | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Train and deploy deep learning models

teacher avatar Nour Islam Mokhtari, Deep Learning Engineer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

62 Lessons (8h 16m)
    • 1. Promo video

      1:19
    • 2. Introduction and course content

      3:33
    • 3. Operating System, Python IDE and Docker

      4:21
    • 4. Setting up Google Cloud Platform

      2:36
    • 5. Setting up a VS code folder and creating a virtual environment using Virtualenv

      6:09
    • 6. Python packages we will be using and how to install them

      2:04
    • 7. Testing your installation and setup

      1:51
    • 8. How do machine learning or deep learning projects usually work?

      2:56
    • 9. What is our end goal?

      4:42
    • 10. Downloading the dataset

      7:29
    • 11. Data exploration : splitting data into category folders

      14:07
    • 12. Data exploration : visualizing random samples from the dataset

      14:17
    • 13. Data exploration : getting insights about widths and heights of images

      8:06
    • 14. What to consider when building a neural network for our task?

      5:26
    • 15. Building the neural network architecture using Keras and Tensorflow

      13:04
    • 16. Creating data pipelines using generators

      15:16
    • 17. Putting everything together inside a train function

      15:59
    • 18. Improving and cleaning the code for robustness and automation

      10:41
    • 19. Launching training locally on a subset of our data

      4:30
    • 20. Adding evaluation at the end of training

      14:26
    • 21. Summary

      2:11
    • 22. Our different setups for reading data during the training

      4:21
    • 23. What are buckets and how to create them

      8:37
    • 24. Uploading our data to the bucket

      3:55
    • 25. Creating a credentials json file to allow access to our bucket

      5:50
    • 26. Problem with our credentials file and how to fix it

      16:32
    • 27. Adding code for downloading data from the bucket

      17:34
    • 28. Verifying that our training pipeline after the new modifications

      5:53
    • 29. What is docker and how to use it for our project? (optional)

      3:33
    • 30. Small modifications to our files

      2:21
    • 31. Building a docker image using dockerfiles

      9:18
    • 32. Running a docker container using our docker image

      9:51
    • 33. Adding arguments to our training application using Argparse

      16:12
    • 34. Necessary steps to use Docker with GPUs

      5:28
    • 35. Building our docker image with GPU support

      9:07
    • 36. Summary

      2:01
    • 37. What is cloud computing and what is AI Platform? (optional)

      5:48
    • 38. What other APIs do we need?

      9:15
    • 39. Pushing our image to Google Container Registry

      9:47
    • 40. Setting up things for our training job

      7:32
    • 41. Launching a training job on AI Platform and checking the logs

      5:54
    • 42. What is hyperparameters tuning?

      5:42
    • 43. Configuring hyperparameters tuning

      10:48
    • 44. Building a new docker image with the new setup

      1:28
    • 45. Launching a training job with the new setup

      8:27
    • 46. Saving our trained model (but there is a problem)

      8:52
    • 47. Adding function to upload trained models to a google bucket

      4:30
    • 48. Zipping and uploading trained models to google storage

      13:26
    • 49. Running the final training job

      11:00
    • 50. Summary

      5:34
    • 51. What is Cloud Run and what is Flask? (optional)

      2:12
    • 52. Creating the skeleton of our Flask web app

      11:17
    • 53. Adding a helping function to only accept certain images

      5:59
    • 54. Creating a view function to show our main web page

      14:00
    • 55. Quick test to verify that everything is working properly

      4:46
    • 56. Finishing the main web page

      6:12
    • 57. Adding a web page for viewing the uploaded image

      11:04
    • 58. Finishing the web app and testing our code locally

      22:01
    • 59. Using gunicorn to serve the web app instead of Flask server

      4:51
    • 60. Dockerizing our code

      12:59
    • 61. Deploying our web app to Cloud Run

      11:14
    • 62. Summary

      5:56
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

43

Students

--

Projects

About This Class

This course will take you through the steps that a machine learning engineer would take to train and deploy a deep learning model. We will start the course by defining an end goal that we want to achieve. Then, we will download a dataset that will help us achieve that goal. We will build a Convolutional Neural Network using Tensorflow with Keras and then we will train this network on Google AI-Platform. After saving the best trained model, we will deploy it as a web app using Flask and Google Cloud Run. Throughout the course, we will be using Docker to containerize our code.

Meet Your Teacher

Teacher Profile Image

Nour Islam Mokhtari

Deep Learning Engineer

Teacher

Hello!

My name is Nour-Islam Mokhtari and I am a machine learning engineer with a focus on computer vision applications. I have 3 years of experience developing and maintaining deep learning pipelines. I worked on several artificial intelligence projects, mostly focused on applying deep learning research to real world industry projects. My goal on Skillshare is to help my students learn and acquire real world and industry focused experience. I aim to build courses that can make your learning experience smooth and  focused on the practical aspects of things!

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Promo video: Hello and welcome to my course on how to train and deploy deep learning models. My name is not Islam Atari, and I will be your instructor for this course. I am a computer vision and machine learning engineer with three years of experience working on several deep learning projects going from prototype to production phase. In this course, we will work on the full lifecycle of a deep learning project. We will start from a real-world problem. Then we will frame it as a machine learning task. Then we will look for a data set that can help us address that task. And we will build a convolutional neural network that will use that data set in order to solve our problem. We will then leverage the power of Google Cloud platform in order to train and deploy our deep learning model. To weld the course, we will use several different technologies, including Cara's TensorFlow, Google AI platform, Google Cloud run, docker, flask and others. So I hope that you're excited enough to join this course and see you in class. 2. Introduction and course content: Hello and welcome to my course on how to train and deploy deep learning models. Discourse has been split into multiple sections. And each section we will be introducing new technologies and new techniques to help us achieve our goal for training and deploying deep learning models. In the first section of the course, I'll be showing you our software setup that we will be using throughout the course. So first I'll be showing you how to use Visual Studio Code and then how to install Docker. And after that, how to set up a Google Cloud account where you can get $300 of free cloud credits. And after that, I'll be showing you how to install all the necessary Python packages that we will be using throughout the course. In section two of the course, we will be building our deep learning model. So for that, we will start by defining how machine learning projects usually work. And then we will download a data set and do some data exploration on that data set. After that, we will build a convolutional neural network using carriers from tensor flow. And then we will create data pipelines using generators. And the third section of the course, we will be introducing Google Cloud Storage. So for this, we will be starting by defining what buckets are and how to create them. And then we will create a credentials JSON file to allow us to access our bucket from anywhere. And then we will see how to download data from the Google bucket using Python. In the next section of the course, we will be dock arising our code. So we will first start by defining what DTU called Docker as and how to use it for our projects. And then we will see how to write Docker files and how to build Docker images that support GPU for training. And the section after that, we will be training our deep learning model on AI platform. So for that, we will start by defining what cloud computing is and what is AI platform. And then we will push our Docker image that we built in the previous section to Google Container Registry. And then we will launch a training job for an AI platform and perform some automatic hyperparameters tuning. And finally, we will save an upload our trained model to Google Storage using Python. And in the sixth section of the course, we will be deploying our trained model as a web app using Flask and Cloud run. So for that, we will start by defining what Cloud run and what Flask are. And then we will build a simple web app using Flask and HTML. Then we will duck arise are web app using Docker. And finally, we will deploy the web app using Cloud run. So I hope to see you in class and by. 3. Operating System, Python IDE and Docker: Hello again and welcome to this section where I will be showing you the operating system that I will be using and also the software setup that we'll be using throughout the course. So for the operating system, I'm currently using you been two, version 20.04. So this is the last version as of July 2020. And you can use other operating systems is just that maybe some of the installation procedures will be a little bit different for you. Apart from that, we will be using some software and some tools. And for this specific course, I will be using Visual Studio Code. So Visual Studio Code is an IDE that you can use for multiple languages. Here we will be mostly coding python and a little bit of HTML, but mostly Python. So in here we will be using this IDE to, to create our project and to do all sorts of things. We will also be using Docker. Docker is a software that will allow us to containerize our application. So if you don't understand what containers and containerize mean for now, then don't worry about that. We will be explaining all of this later, but for now, we will just install it because we will be using it later. So to install Docker, you can go to docs dot docker.com slash engine slash install. And, uh, for me it's I'm using you been to operating system, so it will be slashed. You've been to. So for if you want to install on other operating systems, it will be a little bit different, but you can just Google it. I know that there are other procedures for other operating systems. So in order to install Docker on Ubuntu, so the only thing you need to do what they have several methods. But for us we will be using the recommended method, which is to install from our repository. So what this means is just that we will be using a terminal. So you can open a terminal on Ubuntu. And if you don't know how to open a terminal, you can just click to click on activities and type terminal. And you can get a button where you can go straight to the terminal window. So here, the only thing you have to do is to go through these commands one by one. And you, you finish all of them. And by the end you will have a, a Docker version installed on your machine, which will be a stable version. So you go through these steps from one to three and until you get to install Docker Engine. Here you only need to do the first step because you can install the latest version of Docker which is stable one. The step two and forward are for installing specific versions of darker, but for you it will just stop here. And you will have a Docker version working on your machine. And when you want to verify whether the installation process went smoothly or not, you can just run this command here. So, okay, let's type my password here. And then now the darker is working behind the scenes. So for me, I already have Docker installed and I have already run this command before. So this is why I didn't take that long for you. It might take maybe a few seconds because, because it will be doing some sort of processing where it's downloading the image from the cloud or from the the Docker repository. And then it will be run in it. So if you see this line here, hello from darker than, this means that your Docker installation was was correct. 4. Setting up Google Cloud Platform: Unless, and at least we will be using Google Cloud Platform. So in order to use this, you don't need to install anything on your machine. The only thing you need to do is to Google Google Cloud Platform. And here it should be the first result you get. And in order for you to use Google cloud platform, you need to have a Gmail account. So for me I already have one in here, which is, which, which is Gmail account that I created for this course. And in order for you to go to the last four, we just need to click on go to console here or here. It doesn't matter. So for me, since I already have this account, I have created it long time ago, so I have everything here. It's showing me my latest project, some APIs, some other things here for you. It will be asking you to set up a few things and maybe add a billing account. So you would need to add to add a credit card. And don't worry if it's your first time using Google Cloud Platform, then actually you will get around $300 for free cloud credit. So you can use this, this amount to do all sorts of things. It will be more than enough for you to go through this course and to do all types of things that we will be doing throughout the course. And don't worry, even though they asked you to add a payment method, they will not charge you. Even when that $300 is finished. They will not charge you unless you actually give them your approval. And for this $300, you will have it for one year. So for one year you can do all sorts of things. Were in Google Cloud Platform. You can does all sorts of services. And as long as you don't go beyond $300, if you actually finished that $300 before the end of the year than they will not charge you, but they will tell you that you can no longer use the services on the Google Cloud platform. And if you want to continue to use them, you have to allow the platform to charge you on your credit card. And apart from this, then you should be set up. And now let's start doing some coding. 5. Setting up a VS code folder and creating a virtual environment using Virtualenv: And now, in order for us to install all the necessary Python packages, we first need to set up a certain folder on Visual Studio code. And then we will be adding everything that we need into the dist folder. But let's start by opening up folder here. So for me it will be on here, Udemy create courses and name of the course. I will be creating a new folder here called Code. And I'll be using this specific folder for my, for my project here on Visual Studio code. So as you can see, we have an empty folder. There's nothing inside. So let's first open a terminal here. So you can go to view then terminal. So as you can see, we are inside this folder that we just created. Let's first check some thickness. Let's type Python and see what we get. As you can see we hear when we type python, we get this version of Python, which is a version 2.7. And let's click Control D to exit that Python council. And let's type python three. Now, as you can see, when we type python three, we get dispersion as well, which is version 3.8.2. So what this means is that we have, we have two different Python distributions installed on our machine. We have Python two and Python, Python three. Now, we can proceed by using one of these two Python distributions, but it is not recommended when you are trying to build a certain product or a certain or when you are working on a certain project that has its own dependencies. The recommended way actually is to use what is called virtual environments. So virtual environments are, you can think of them as this, as these isolated blocks where you can have a Python distribution that goes that's installed within these virtual environments, but it's isolated from your operating system. In order to do this, we need a certain package to install and install it now by running the command PIP3. So in order to make sure we are using Python three, so PIP3 install virtual. And so as you can see, I already have it installed. So for you it will be installed. And now that we have this virtual and package, we can use this package to create a virtual environment. And how do we do this? We just type this command virtual end. And then we give our ritual environment a name. I will be using. Vn has a name. This is the convention that is used in PyCharm, so I will be using it here as well. So my virtual environment will be called BM. Let me now created. And now as you can see on the left, we have this new folder that was created and it contains lots of files. What these files are ensuring is that we have now a Python distribution that's isolated from our operating system, but it's still not activated. In order to activate this distribution, what we need to do, let's first clear our terminal where we need to do is to write the following command, source. And then we access the folder here, Ben, and then activate. As you can see now, we have the name of the virtual environment at the beginning of the line, which we didn't have before. What this means is that we are now inside a virtual environment that has its own Python distribution installed within it. And the great thing is that it's not, it's not linked to our operating system directly. So when we install Python packages, they will not affect our operating system. Everything will be isolated, which is really good when you, when you're working on, on these kind of projects for machine learning or for other types of projects where you want your visual environment to be separated. So now let's, let's try typing the command python. As you can see without writing Python three, we get Python 3.8 on to version, which we didn't explicitly say. We didn't explicitly asked ritual AMP to install Python three, but by default, it installs it mostly now, there's no most of the day virtual environment kind of packages don't use Python to. By default, you actually need to tell it if you want to use Python to. But for us we will be using Python, Python three. So let's go with this, with this distribution, which is a stable version. Ok. Let's, let us now exit this Python console by clicking control D. Now we're still inside our ritual environment. And let's start installing all the necessary Python packages inside this environment. 6. Python packages we will be using and how to install them: So let's start installing the Python packages that we will need to weld discourse. So in order to make it easy for you, what I have done is that I extracted my packages or the names of the packages that are used inside the virtual environment. And they're all in a file called requirements.txt. So one thing you can do is to just with this file inside your project here or inside your folder. And I will be attaching this file to this lecture so that you can, you can get it yourself and you can put it inside your project and install all the necessary packages. So here as you can see, these are all the packages that I have used. Two, throughout the hour we'll be using throughout the course. So in order to install all these packages, what you need to do is just first of all, make sure that you are inside the virtual environment. As you remember, to activate the environment, you just need to write source VM been activates and once you click, you should be inside the the environment. So now I am going to install these packages by typing the following command. So PIP, install and Jante decal we'll add here is minus R, which means that store from a requirements are that txt file. And now we put the name of the file here and we click it should start installing all the necessary packages. So let's wait for them a little bit while installing. This could take some time, so you should be little patient. Especially some of the, some of the packages are have large sizes such as TensorFlow. So I'll be back one day when the installation has finished. 7. Testing your installation and setup: Now that we have all our Python packages installed, them, test them and see if they are installed correctly. Well, we're not going to be testing all of them, but some of them just to verify whether the installation from the requirements file going smoothly or not. So let's enter a Python console in here. And let's import for example, TensorFlow, which is one of the modules that we will be using a lot during this course. Let's import, for example, from ten TensorFlow. Let's import Aras, which is the main API we'll be using. As you can see, it was it was imported correctly. Let's, for example, brains de version of TensorFlow. As you can see, we have installed divergent 2.2, which is the exact version that we have in our requirements.txt file. So I am installing this version specifically because as of July 2020, it's the last and stable version from TensorFlow. You could be looking at other new versions if you are watching this course in the future. So mainly, I don't think that there will be much differences between your distribution and mine, as long as it's not Version three, for example. So, so now that we have everything installed correctly, let's start creating a new file in our project, and let's start coding and creating our first neural network. 8. How do machine learning or deep learning projects usually work?: So before we dive into coding and creating our neural network, let's start by thinking about how do ML projects usually work? So usually in ML projects, a typical pipeline would look something like this. To you first define a task that you want to accomplish. And then you get all collect the data set that you need for that specific task. So when I say Get, it basically means you go gather or you go search for that data set. Maybe someone else has already collected it and has put it somewhere. And you can just use that specific data set. If you can't get it, then you need to collect it yourself. So if it's a data set that needs to contain images than maybe you would need to go around and take pictures with your phone or with your camera. And then create a certain data set that you can use for that specific task. And then what you would need to create your machine learning model. And here of course, you might have many different options depending on the type of your data set, how large your data set is, so on and so forth. After that, you would train your machine learning model. So in this specific step here, maybe you will choose to train it on the cloud or on-premises. Maybe you have a really good machine that has enough power to train your deep learning model. And after the training is done and you are happy with the results of your train machine trained deep learning model. You then deploy it. And you have also here many options where you can deploy it on the edge. For example, if it's a mobile application may be, you will deploy the model with your mobile application. So everything will be embedded in the phone. Or maybe you choose to deploy on the cloud, which is the thing that we will be doing in discourse. And after you deploy your model, you can test or you should test your deployed model by making inference online and looking and analyzing whether your model is working properly or not. If at some point your model starts decreasing inaccuracy or not working properly, what you would do is that you go to the step where you get or collect new data for your machine learning model. And the cycle starts again. 9. What is our end goal?: So what is our end goal anyway? What are we trying to accomplish here? So basically, any ML project or AI project, or the project, depending on what you are trying to do. The task is something that you always start with. The tasks really comes from the business aspect of things. So usually you start from a business problem and that business problem is translated into an engineering problem. And within that engineering problem, you start proposing solutions that are maybe based on machine learning. So maybe you are working for a certain company. And one day your manager comes to you and tells you that your customers or the customers of your company would like to know more about the food that they are ordering in restaurants. So maybe they want to know more about the calories that daft food includes. Or maybe they would like to know if the food is healthy or not. So there are many options like this, and this is the business aspect and this is the business problem. So now it's your job to translate that business tasks into an engineering tasks and solve it with the necessary tools. So after the task is defined, your manager might come to you and tells you that maybe another team in the company has already gathered and collected a certain data set. This data set is basically a group of images of different types of food taken using a mobile phones. And there are, he tells you that there are 11 different types of food. And he would like you to use this data set in order to address the task at hand. So for you, you start with this specific data set and you start thinking about the task at hand. And you start thinking about how can you, can you use this specific datasets to address that business case or business tasks. So now that you have the task at hand and you have the datasets, you start thinking about what should you focus on? What should you start building using that data set? And knowing that specific business problem. Maybe the first thing that you'll think about would be to focus on recognizing the main ingredient in the dish. Start thinking that maybe by, by recognizing the main ingredient in the dish, you can extrapolate on that and maybe get an idea about whether that is, for example, healthy or not. So we know that the main ingredient, ingredient in a dish will have a major impact on whether that dish is healthy or not. So for you, you start thinking that maybe the minimum viable products or the minimum thing that you can build is to create a web app that lets you upload images and get the main ingredient in the dish. You are. Do you think about this specific solution, this specific first solution? Let's call it like that. Because to you, it seems like if you are going to address that specific business issue, then this is the minimum thing you should be able to build. And after you build this, maybe you can go look into how to improve that and how to maybe get more insights from the images. But the first task that you want to focus on is just to recognize the main ingredient in the dish. So you speak to your manager or your manager is happy with this first idea and this minimum viable product that you want to build. And he gives you the sign to go ahead and now you are about to continue along the ML Pipeline that I showed you before in choosing a machine learning model and train in it and then deploying it. 10. Downloading the dataset: And now in order for you to continue along that ML pipeline, you need to get to the data set that we will be using to create our image classifier or our model that can recognize the main ingredient in a dish. So to download the data set, you need to go to this link that I will be attaching to this lecture. If it's your first time accessing this website, Kaggle website, then you will need to create an account with them using your Gmail account or other counts. I think they also have a possibility to login with LinkedIn. Once you have an account with them, you can download the data set by clicking on the download button here. When you click, it should start downloading immediately. The dataset's size is not one gigabyte. In fact, it's 2.2 gigabytes. And it might take some time to download depending on the speed of your internet. And after your data set has been downloaded, you should extract it. And once it's, you can start exploring how the data set looks. So here I noticed that they have a duplicate of the datasets. So if you go here, you have the same folders in here. So for me, I will just delete folders here. And then what I will be left with is the folder that contains the same folders that we had before. And here, as you can see for the training, we have a lot of images. Let's see how many images we have here. So it's, we have around 98 or 9,866 images for training. For evaluation. This check again here. So for evaluation we have around 3,347 images. And for validation. I think yeah, we have something very similar to the evaluation data set, which is around 3,400 images. And now that we have our images split into these three different folders. Actually when we downloaded the data set, it was like this. Usually you will not have this. You will have to split the data set yourself. But here we have the data set already split. So the data set or the data that exists in the training and validation folders we will use during training. And the one or the data that exists in the evaluation folder. Sometimes it's also called testing folder. The data that exists in this folder we will use after the end of the training so that we can evaluate or test our model to see whether it will work. Okay or not on data that had that the model has never seen during training. This is why we have this. Of course there is, there's a lot of differences in, in, in conventions. Some, some machine learning engineers will split the data into 80%, 10, 10% percent. So 80% for training, 10% for validation, 10% for evaluation. Some of them will do 70%, 15, 15% percent. And some of them will do like this, 50%, 25, 25% percent. So it really depends on how each engineer will, will, will, will choose to do this. But all of these options are possible. So we will now worry too much about this now because we already have the data set or the split for us, so we will use it like this. But now one of the things that I don't like in how the data set is organized. It's the fact that all the images of all different types of food are in the same folder. So here, as, as you can see, all the images that starts with 0 in their names, these belong to one type of food. So for example, for the images that start with 0, here, they are bred based foods, so broad-based dishes. If we go, for example, to do another class, a second class. The second class, all the image name starts with one. And these are dairy based products. So you can have milk, cheese, butter, and all these types of dairy products. And the same thing goes for the other classes, as you can see, for each type of food, or for each class, or for each category, the images start with the same numbers and then you have all. You have the, for example, the image will start with two. Then you have from 0 until whatever number of images that belong to that category exist. So for me, I don't like when the data set is organized like this, I like to have, for example, each I would like to have different folders where each folder will contain only the images that belong to the same category. So for example, here, all the images that start with 0 and belong to the bread based dishes or products. I want them to be in the same folder. Let's now split that data into different folders. But instead of doing this manually, let's code this. Okay, so first let's, Let's create a new file. And let's call it, for example, data handler that pi. And in this file we will add some help, helping functions that will help us to one, splits the data into those different folders, as I have mentioned before. And to help us explore the data set a little bit, maybe look at some images and, and, yeah, and maybe look at the size of the images. So let's do this in the next lecture. 11. Data exploration : splitting data into category folders: And now let's start splitting the data into different folders, as we have mentioned before. And maybe do some basic exploration such as visualizing the images, may be looking at these sizes or the mean size, the median size of the images. And yeah, let's just start by importing some of the Python packages that we will need. So we will use all of these packages. And now let's create the function, or let's start by creating a list that contains all the names, the food categories that we will be using. So thus creates a list that's called who'd classes. And as I had mentioned before, we have 11 types of food. So we have, for example, bread. We also have dairy products. We have dessert, and so on, so forth. Let me just copy the rest of the names of the classes. So these are the names of the classes. We have 11 different types of food. And of course, each index of the name of that category corresponds to the index that the names of the images start with. So as we have seen before, we have all the images that correspond to dishes or types of bread based food. We have put bread in the first index because those images starts with 0. So the 0 index in the list corresponds to the 0 where the names of those images starts with. And I have respected this kind of rule for all of these. So for example, the fourth class that starts with three will have the same index here as the egg name. So the same thing goes for all the other types of food here. Let's start coding now, a function that will help us to splits data into different folders. So let's call it split data into class folders, for example. And this function will give it to our data, and we will also give it a class ID. The class id basically represents either 012 until 11, okay, until ten, sorry. So here what we will do is first, we will read all the names of the images in all inside the folder path to data. So in order to do that, let's use the glob package. And let's give it the path to the data. And also we will, we only want to read the files that end with JPEG here. And now what we do is to go through each path in the images paths and we will read the base name of the path. So the base name is basically the name of the image without the full path. Okay? So for example, here in this path here, as you can see, this whole path, let's say there's a file that exists inside this code folder. For example, this data handler that pie, the base name that I want is just data had No.5. I don't want all the directory here. So this is y. This is what the base name will represent. So in order to do that, let's use the function base name and give it our path. And now, in order to verify that we are in one category. And another one, what we will do is we will verify if base name, which is of course a string. So we can, we can use the method starts with, so if the base name starts with the class ID plus a, an underscore here. So if the base name starts with this, here, we will create a path to save the new file or the file when we move it. So here we will do a join. We will use the join function, which will help us join different paths. And we will join the paths to data with the name of the class. So this is where we will use our list here and we will give it our class ID. So now this is a path where we have a folder inside our path to data. Of course this will be clear when we run the function. So don't worry too much if you don't understand it right now, when we run the function, you will see what this will do. So now what we would like to do is to verify whether this path exists or not. So if this pad does not exist, so let's use this directory here. Or this function is dear to verify whether the path exists. If it doesn't exist, then let's just create it. Okay. So we want to make sure that we don't have any errors. If the pad does not exist, then we will create it. And after we created, what we should do is we're going to use the SSH util package in order to move the file from its current Pat to new paths. And this should be, this function will go through all the images, are all the pions inside this path here. And then it will read the first letters of that fat. And based on those first letters, we will put them in a new folder that corresponds to one of these categories here. So in order to run this Python script, let's, let's create this main function or main kind of manner of programming. So basically, if you don't know what this just means that when you add this, then any code you put here will be run only. You run the whole script. So if I do python data handled up pi and I run this command, then everything that exists here will run. But if I import this folder or this file into another file, then in that case, nothing that exists here will be wrong. So that's solely does this specific line. So now, well, we'd like to do is to maybe read the bats to our different folders of training and evaluation and validation. So let me just copy these fats. We will do the same to our validation data. And of course, the only thing that will change the name of the folder. And finally, the evaluation data. We will paste this as well, which will be the evaluation folder. And now what we would do is to run our function. And before we run the function, because I know that we will be adding other helpful, helpful functions later. I usually like to do this, which is of course just my way of doing things. You can choose not to, not to do it. What I like to do is to create some sort of switchers, which is just a fancy name for a, for a boolean. First I'm gonna give it the value if true, because I do want to split the data now. Now why do I do this? It's just that if I know that I will be running or I will be creating a lot of code here. But I don't want to run all that code every time. What I do is create these switchers and then verifying. They are on or if they're here, if they are false or true. And based on that, I learned the code. So if split data switch is true, then here I will call my function. But of course my function, one thing bad to the data and the class ID. So if I give it 0, only copy the data or the images that starts with 0 into a new folder called the bread. So if I give it to one, it will only copy the images that starts with one and create a new folder called dairy products and copy them there or move them there. Sorry. So I want to do this for all classes. So what I will be doing here as just looping through 11 classes. And how we'll be calling my function, which is split data into split into class folders. And I will give it the path to. Let's start with the training folder example. And I will give it as an index. So it will do this for all the different folders that we have. Now before we run our script, let's just look out, look at the data that we have. So as you can see here in our folder for training, we have the data mixed up. All the images of all the different classes are in this folder called training. So let's run our script and see what will, what will happen. Okay? So Python data handled it up by having an error here. Let's run the script again. Let's go back to our folder. And as you can see, now, instead of having those images, all of them in one folder, we have different folders, and each folder should contain only images of that specific categories. So you can see that all the images you start with 0. The same thing for the second class. All of them will start with one, so on and so forth. So this is a much better way, in my opinion to see the data, to understand that, Okay, this belongs to this class and the others belong to another class. And instead of having them, for example, like this, where everything is mixed up. So let's do this for the other two folders, evaluation and validation. So in order to do that, let's just copy our loop here. Fix the indentation. And here of course I don't want the train, but I want the validation path. And also here, the evaluation data. Of course I still have this switch on, so the code will run. Let's do this. Now. We should have all the images in all the folders splits into the correct categories here. The same thing for evaluation. So I think that this is a much better way to look at data and to understand the different categories. And let's start visualizing the data and maybe looking at the size of the images in the next lecture. 12. Data exploration : visualizing random samples from the dataset: And now let's create a function that will help us to visualize some of the images in a random way. So this is something that a lot of ML engineers do, which is, if you want, let's say you have a team that has annotated, acquired the data, annotated it, and then you have it in the manner that we have now. So we have folders. Each folder contains images from the same category. What you would do is ideally, you would do some small verification and, and just tried to see if the beta is okay or not. And this doesn't mean that you should go to all the images and look at all of them. But maybe just doing some quick random looks. Maybe sometimes just by doing this, you can spot some mistakes. You can spot something that needs to be, needs to be changed, for example. So this is what this function that we will build next, we'll do. So. So I wanted to create a function that we'll look at the images and then randomly choose samples from these images and then show them on the screen. Okay, so let's call it visualize some images. And what we want is a path to our data. And then this creates an empty list that will store our, our paths to images and also the labels. So the labels are basically these names here. But what we want is for each image path, we want to create the name or the label of that specific. Ok. So here, this will be also a list here. And for now, let's not use glob lists. Use another tool from Python which will be very helpful in this specific case, where we want to read the pads of images that exists inside folders and folders are by themselves inside another folder. So here, for these values here, we will be using the function walk from the OS package. And we will give it the path, our data. And then for each file. And this f, we will verify FDA file ends with the JPEG. So basically an image, or we will do is we will append to it a full path. So we will join. Who will join our relative field here with the name of the file. And also we want to create or we want to append the label that corresponds to that specific image path. And here the only thing we will do is just get the base name just as in the same way that we have done before. And of course, I don't want the base name of the file, but rather of the directory. So what's this specific line will do is, for example, if you give it a path like this one here, and you ask it to get the base name. It will give you this last name here or the last folder. And this is exactly what we want because we will get in our directory, we will get the full path to the folder, for example, but the label that we need is only bred. So in order to get that, we will just use the base name function and then create a figure by using the matlab lived up pyplot package. And then here we will do the plotting. So as I said, I want to look at samples of images and plot them on the screen. So I would like to see around 16 images, let's say, or exactly 16 images. If you want to see more density, then you can do that as well. But for me, I think 16 random images will give me some idea about the data. And if I run the code several times, I will get different images with their different labels every time, which in my opinion, a good way to get some visualization of data and some basic ie verification, that's Scott, it like that. So for me what I will do is we'll create our go in a range of 16 images. Here, I'm gonna create a chosen index, which will be a random index. And I would like to get an index between 0 and the length of the pattern minus one. Because this function here it will also take, it can take the value that we give it in the second parameter. It's actually included. So we don't want our list to try to access an element of this index. And that's why we will do minus1 here to make sure that we never go out of range of the list. So here, let's choose an image using this random index that we have just generated. This label using this same chosen. And what I would like to do is to create a subplot. Oh, subplots from the, from all the images. So each subplot will basically mean one image, but I want to show 16 images. So we will have 16 subplots. And in order to do this, we can do, for example, axes are x equals big figure that we have created. And we add a subplots. So the subplot, we'll get a number of rows and columns. So you get four rows for column because we want 616 images. And then we will give it the index of the image. Of course, the subplot does not take 0 has possible number for the image, so we will start from one. And then what we would do is add a title or setText, sorry. We will set a text Here. We will give it the chosen labeled as a title for that image. And the reason I'm doing this because I want each image to have the name of the label or the name of the category of that image on top of it as a title. And this way, for example, I can verify if maybe some images are not in the correct place, may be some image of type bread is category, so this manner will help me verify that. And finally, that show the figure or the image. So in order to show the image, we actually need to read it first. So that's why I will be using the pillow, that open function here. And I will give it my chosen image. And this should be o for putting the images in this specific figure. But here we want to tell the matplotlib library to actually show the images. That's why we need to add the show method at the end here. Okay? And also, as I said before, I use switchers and now you will understand how we use those switchers in a specific way so that we. Activate and deactivate and some sort of way, some parts of the code. So here what I would like to do is to add a switcher. For example, let's call it visualize data switch, or give it a value of true. And now for this switch here, I'm going to give it the value of false, which means that this part of the code will not foreign, of course. And now visualize data switch equals true. I will verify here. If the switch is on. If it's on, what I will be doing is calling that function that I have just created. And then I will give it the path to the train data for example. So think everything is okay. Let's, let's run this code here and see what we get. So as you can see, we have 16 images. Each image will have the name of the category on top of it as a title. But as you can see here, we have some of the some of the images. Titles do not look well because all the images are close to each other. So let us just change that in our function. So in order to do that, one, or the function that we should use is called tides lamp. Here we give it some spacing or some padding between images. We have the images of different types of food. And on top of each image we will have the name of the category here. So just by doing this, by running the code several times, we will get different images and you can get an idea about how how the data is categorized, whether it's correctly categorized or not. So here, I believe everything looks fine. So this is a desert, although a pizza, but it's a broad-based. So it's correct. Ooh, ooh. This is eg. Lets us run the code again and maybe other types or other images. Seems, everything looks fine. The data is correctly categorized products. Maybe some butter also. So now, now we have verified this. And let's, let's do some other data verification or exploration thing which I usually like to do, which is to verify the size or width and height of the images and get some, some, some value, some insights. So I usually like to see the mean value of all the widths and the median value also of all the widths and heights. And this will help us later when we design our neural network. So let's do this in the next lecture. 13. Data exploration : getting insights about widths and heights of images: Okay, so now let's create our last function that will help us get an idea about the width and the height of the images. Here. Call it get images sizes for example. And also going to give it the PET data. Here. We will do basically the same thing when reading the images. So we will create an empty list here. We don't need the labels now. I don't think we need a counter because we can just look at the size of our list or the length of our lists. So this creates another empty list here. Now we will do the same thing. We will use the walk function to go through all the files and then we can read these images or open them. And then we can get the size of those images. File and a jpeg. So if the file is an image, what I will do here is I will open the image or read the image. So here I1 gets the full pad, that image before I read it. And after I read it, I want to append its size or appended with the widths and heights list. Okay? So in order to get the width, we just need to call the size method only gets the first elements. And we'll do the same here by getting the second. Let's close the image goes. We won't need it after this. Here. Now that we have two lists that contain the widths and heights, we can do some basic computations to maybe yet the mean and the median of the height. So let's start with min-width. Basically the only thing that we need to do is to get the sum of all the widths. And we'll divide it by the length of the list. And we will do the same thing for the height here, divided by the height. So this is to get the mean of the width and the height. But I also like to get the median because the median also reflects other things that the mean does not reflect. Maybe, maybe you have so much difference between the size of the sizes of your images. So with the mean, you may not get an idea about, about that. But if you use the median, then you should get some, some insight about whether there is, there is so much difference between the image size the images sizes. So here I will use the NumPy package to compute the median of the list here. And also I will get the median height. I will use the same function and give it the list of heights. And I want to return all of these values. Mean height, meeting with median. Hi. So this is my last function, which will give me some ideas about the width and height of my images. So in order to do that, let's add a switch. Let's first turn off the switch for visualizing data. We don't want to visualize it now. I just want to see the different sizes and call it, for example, prints inside, which give it a value of true. These we don't hear how verifier heavy my switch is on. If it's on our call the images sizes, I will give it the path to, for example, train data. And of course I want to get all of these values and let me just copy them here. I want to print them. So in Python, I think from Python 3.6, you can use these F Strings. I don't know if I mentioned them before or not, but the F Strings basically allow us to say you want to print the mean width, you can do just so you just add the width here. So I think that this is a much cleaner way to look at the data or to bring data. So mean. So just clear the console here. Now, let's run our code and see what we get. It might take few seconds to go through all the images. But at the end we can see that we have a mean with around 525, the mean height of four, 89, median width and median height of 512512. So we see that basically by looking at this, what I can infer is that most images are in the range of 512. So 512. I will use this value, or I will account for this value when I am creating my neural network, especially the input. So now that I know that most images are also having the same height and width, this will allow me to choose the input from my neural network in such a way that I don't distort the image too much. So this is why these insights can be very, very helpful. Okay, so let's go to the next steps. 14. What to consider when building a neural network for our task?: So what should we consider when we are trying to choose one of these machine learning algorithms for our test. So there are three things. The first thing is the size of your data set. How big the data that, how big the data set that you have. The, this is very important for the choice of the algorithm, also the type of data. So as images, you have maybe some rows and columns that represents some, maybe some time series, a data set. And also the size of data points in your data set. By that I'm talking about the number of features that you have in images. This would be the number of pixels in your image. And actually not just the number of pixels. If this RGB images, then you can think of it as the number of the width multiplied by the height multiplied by three because you have three channels. So all of these are important things to consider when you are choosing a machine learning algorithm. For us, of course, who will choose to use a convolutional neural network. And the reason for this is that while considering these three points, we find that convolutional neural networks, or CNN's for short, are the best option. And they have been the best option for the last eight years. We are in 2020 and in 2012, there was a neural network called AlexNet that's got a great score in one of the computer vision competition by using convolutional neural networks for classification. So the task of classification is based. Now almost completely soul. Of course, researchers might, might differ about this point because of some specific cases that I will not mention here. But there are cases that we have seen in research. And research. Researchers think that classification is still not completely solved. There is, there is still a lot of things to be done there. But for practical reasons, convolutional neural networks have solved the classification task. And they have shown that they are the best algorithm to solve this kind of tasks. And when we consider the size of our data set. And we are considering to use convolutional neural networks. Something or, or a technique comes in mind immediately, which is transfer learning. So transfer learning is basically taking a neural network that was trained to do some stuff or trained for the same task on a different datasets. And then we use it for our tasks with our data set. So this is an image that I found on the website and see that AI. And I think it's summarizes well what transfer learning does in the case of classification using CNN's as, So we will be using something almost exactly like this. So for transfer learning, what we basically do is that we take parts of our neural network and we keep the parameters as they are in that part. And then we have different options. But for us, the option that we will use is to add a deep, deep learning neural network, or just one layer or few layers, and then adding a different output. For example, let's say our neural natural that we started with was trained on maybe images of dogs, cats, boats, and so on, so forth. Something like one here. So we will be moving all of this and adding another fully-connected layer that we will train. And also we will be adding a different outputs. For us. The output will have 11 categories, which are the types of food that we have. So we will basically do what is shown here in this image. We will take a neural network that was trained on some other datasets for classification. And we will keep a big part of the network without any training. We will leave the parameters as they are. Then we will add a fully connected layer and a div and an output layer with our categories. So I just want to give you a quick overview of what we will do. Because one, I start coding, maybe things will be confusing. So I thought an image will show you a lot more better of what we are trying to achieve. So now let's go to the coding part. 15. Building the neural network architecture using Keras and Tensorflow: And now let's start coding our convolutional neural network. We will create the architecture just by following the steps that I have shown you in the slide where I, where I spoke about the transfer learning technique that is used in many, many machine learning approaches. So in order to do this, let's start by creating a new file. I'll be calling it trainer that by. So this file will include all the code that we will use to train the convolutional neural network. And of course we will be doing this by using the TensorFlow library and specifically the Cara's API, because it will be so much simpler to do this using that API. So let's start by importing some of the main packages or the main files that we will use in our building, the neural network. So I'll be importing from TensorFlow. Not Cara's that be processing here or sorry. I'll be importing from this package of be importing the image data generator file or package. So I'll be using maybe the stochastic, stochastic gradient descent optimizer may be add him as well. Maybe we'll test both of them and see. And let's start here by building the convolutional neural network architecture. So let's do this inside a function. So I'll start by calling the function build model. And for my function, I'll be using the number of classes as a parameter so that we can automatically build the neural network based on our number of classes. And here, let's start by creating our base model. So based model is the base architecture that we will use for transfer learning. Because as I have mentioned before, transfer learning is a technique where we use a network that was already trained on some data set. And then we take that network, we remove a layer or two or however much we want, we add, then our layers, and then we train those new layers with our own data set. So here for the base model, I'll be using the Inception v3 architecture. And this is where we tell it about the weights that we want to have in the neural network or the Bass Model of the neural network. So the weights are basically the parameters of the neuron, what neural network? And here I will be importing the weights from the ImageNet data set. So what this means is that this base model Inception v3 that I want to start with was actually trained using the data set called ImageNet. If you don't know what this data set as you can Google it. But basically, it's a very, very large set of images. That's this neural network was trained on. And now let's also called the or use the parameter include top. What this parameter does is that we, here we specify whether we actually want the classes or the output layer of the network or not. And in our case, we don't want that. We actually want to remove the top or the outputs and we want to replace it with our layers. So here, how will use falls here? And then for the input tensor or the input size of our neural network, I will be using the input layer and I will give it a shape of. Here. Of course, you have the ability to choose the shape of the input. And for me, I will be using the original size of the network, which is 224 to 243 because its RGB images here. And this is basically our base model. We will build on top of this model the rest of our network. And in order to do this, let's first gets the the head of our model or the output of the base model. So to do this, we just do, just need to call the base model dot output. So now we have the output. So now after we have got the output of our base model, what we need to do is to start adding a new layers that are custom. As So, these are the layers that we want to include, we want to add, and the base network does not have them. So in order to do this, each time I have to call, let's say I want to add a flattened layer. So this layer basically flattens what comes before it. So here I will call this function, and I will give it as a parameter, the previous layer. So here, as you can see, we had. The base model, we took the output of the base model. Now we started in the head model and now we past that had model to the flattened layer. And we stored it again in the same variable called head model. And we will do the same for the other layers that we want to include. So for example, here, I will be doing the, I will be using the same variable name and I will be creating a new layer, which is a dense layer. And what this means is that this is a fully connected layer. And of course you have the ability to choose the number of neurons in this layer. Basically for this step, there is no, let's say, some solid conventions on how to choose these number of units. But for us, we will choose 512. You can also choose 1024, you can choose 1000, and you can choose 700 and something. It just that for this specific architecture, I have seen so many, so many researchers before me and I have actually in the industry, I have tried this a specific number and it has worked for me very well. So this is why I will be sticking with the 512 units here. And of course, when I call the function, I have to give it the previous model or the previous layer so that it takes it as an input and then the output will be stored in the head model here. And also let's add a dropout layer. So it's called the dropout layer, and I'll give it a parameter of 0.5. You can give it 0.25. or 0.3 or 0.7. A really depends. This is one of the hyper-parameters that you actually can change. Every, every time. Let's say you do a training, you test on your validation set and then you see maybe your network is not doing well. So you can change these parameters, like the dropout parameter and also the number of neurons so that you can improve your deep learning model performance. Here I'll be using the 0.5. number as a parameter. And again, I'll be given as the head model. And finally, abusing. I will be creating a fully connected layer that's used as an output for our network. So here I will be calling or be Using the same as before, a dense layer. And here I'll be given it's the number of classes as a parameter. So what this means is that the output layer or the final layer of our convolutional neural network, will have this number of. The number of classes as the number of neurons. So here also, I'll need an activation function. So an activation function as basically a function that takes some input and then maps it to a certain outputs. And for specific cases of classification where the different classes are actually independent from each other, we can just use the softmax function, which is widely used for this kind of a classification tasks. And again, we're gonna give it to the head model as an input. And finally, we want to create the final model where we tell it, okay, here's our input and here, here's the, the outputs and construct a fully-connected layer or fully connected neural network based on this input and this output. So our final model will use as inputs the Bass Model, that input. And for the outputs, we will be using the head model. Because the head model is the variable where we store to the final tensor of our convolutional neural network. And here I want to add something here. What I will be adding now, in fact, we have the possibility to add it later, but I want to add it here just because it's gonna make things simpler and so that I don't forget to include it in the later parts. So basically what I want to do here is to go through each layer in my base model, dot layers. So all the layers of my base model. And what I will be doing here is ensuring that they are not trainable. So I will set this to false. And what this means is that I don't want the, the training to actually change the parameters in my base model. I want the base model to stay the same. And then I want to train the other layers that I have added here. And this is what I described before in the slide where I showed you what transfer learning actually does. So we just, we just transform that knowledge into this code. Basically a few, if you are confused a little bit about what I am doing here, you can go back to that slide and you can check again what, what I have said about the transfer learning part. And basically that said, now we have created our convolutional neural network. You can see that we did that in just a few lines of code. And I want to return my model here. So this function, whenever I give it a certain number of classes, it will create a neural network and the output will be the say will have the same number as the number of classes that I have given here. 16. Creating data pipelines using generators: And now let's create a function that will help us create data pipelines that we will use for feeding data to our model during training and validation. And also at the end of the training, we will use this pipeline for evaluation. So we know that to do this, let's create a helpful function. Let's call it build data pipelines, for example. And what this function will take as a parameter is we're gonna give it the batch size and also the train data back, and also the validation datapath, and finally the evaluation data path. And here we will start by creating what's called a, an image data generator. So to define it, what we do is I'm going to call it augmented here, and I'll explain why in a second. So we will use the import from chaos that we have imported here from Cara's dot preprocessing import image data generator. And what this function basically does is we gotta give it a set of transformations that will be applied on our images. And I'm calling it augmented or because, because this is the term that is used by machine learning engineers and the deep learning engineers. So what basically this function does is that we're going to give it our images, our original images. And then it's going to apply some transformations on them. And at the end we will create a generator that will use this augmented to generate these new images based on the transformations that we defined in here. So for the transformations, let's start with the rescale factor here. And for the rescale, we're going to divide the pixel values in our images. All of the pixel values, we're going to divide them by 255. And the reason for this is that the Inception v3 network actually takes as inputs images that have values between 01 and our original images actually have values between 0255 and they are all integer values. So for us, we need to divide all those values by 255 because that's how the pre-trained network, if you remember, as I shown you in the transfer learning parts, the Inception v3 was pre-trained on ImageNet data sets. And for that data set, this transformation has been applied. So this is something that we need. It's not, it's not optional. The 3D scanning parts. But apart from the rescale, all the transformations that we will add now are basically optional and they will help us generate new images from our existing datasets. And this will be helpful during training to avoid what's called overfitting. And also it's called the regularization. So now let's define the rest of the optional transformation. So for the optional transformations, I'll be using some rotation. So for the rotation range, let's say I want 25 degrees. So between 025 degrees, my images will be rotated. And these, and this range here. And let's use some, some zoom range here. And for the zoom range, let's do maybe 0.15. And for the width shift range. And let's do some on two. So this is something like 20% of the width. We can move in that range. So we can shift our image in with 20% value of the width. So this will be the horizontal axis, the shift, I mean. And also, let's do a transformation for height. And also let's use 0.2 for example. Of course, you can change these values as you like. And I do encourage you to change these values. Four, by yourself and see the effect that they might have on the training. Will also use a transformation that's called shear. So this is shearing basically is also another transformation where we are moving pixels in the image in a certain direction. And I'm gonna give it 0.15 as a parameter as well. And also I'll be adding a horizontal flip. The horizontal flip means that means that our images will be flipped from left to right. So this is another transformation. So the same image. You can think of it as if you look at the same thing on a mirror, so everything will be, will be flipped. So here, this is another parameter that we need to set to true. And finally, we will use the film mode here as the nearest. So basically this is just a what's called a sum method, basically, to fill. Pixels that have no values using a certain function. So here, the function that we will use is nearest. I believe there are other types. May be the cubic and the, and other types of filling modes. Here, for our case, we will use the nearest function. And this is it for the train augmented. And now we will create an augmented or for the validation and the evaluation set. And it will be different than this one because during, during training, what we want to generate new data for the training part. But for the validation and the evaluation, we don't want to introduce transformations because the point of validation and evaluation is to actually see how our model will perform on real data. So in the real world, the real data will not, will not have. These optional transformations are transformations that we add because we want to teach our model to see, to see as many different scenarios as possible. So for the validation part and for the evaluation part, we are going to use a new mentor. And for this we'll use the same function here. The only transmission that we will apply is the rescale transformation. Because this is not optional, our network actually needs to have values between 01. So these are the two mentors that we have created. And now we will create what's called generators from these augmented. So generators will take our data and they will take these transformation transformations that we have defined in augmented, and they will generate new data. So let's start by defining the train generator. And for this, we will use the train augmented. And we will use a function called flow from Directory. And basically what this function does is it takes the path to our training images as an input. And then it will apply some transformations on these images using the train augmented. And it will generate new images as output. So we're gonna give it the train datapath as input. We will also give it a class mode. So the class mode here we're going to define what type of classes we have. So we're going to use categorical. And what categorical means is that these classes are independent from each other. And each class as represents something different than the other classes. There are other tasks, for example, for classification, where the classes could mix. For example, let's say an image will contain rise. And also maybe some, some dairy products like milk. So some classification tasks. You can actually predict that the image contains all of these things. So in this case, it wouldn't be categorical. But for our case, since we know that all our images, or we are supposing actually that all our images will contain one main ingredients. So in this case, the class mode is categorical. And also, let's choose a target size. So the target size is basically the size of the images after we transform them. And here I'll be using the same size as the input, the input size for the neural network. And here actually I want to mention something that I have made actually a mistake before when I defined the input shape for the Inception v3. This is not the original input shape of the Inception v3 network. I verified this n is actually 229229. So I wanted to change them here and show you the change because I verified on the documentation and it's actually 229 to 29. The size to 204024 is for the VGG 16 network. So here we will use this size as the target size. So 229 to 29. And then we will use the we will define the color mode. So the color mode for us, it's an RGB image. All our images will be RGB images. So we will use the color mode, RGB. And for the shuffle, we don't want, we want our data to be shuffled during training. So we will set this to true. And finally, the batch size, we will give it the batch size that we have given as parameter to our function here. This is for the train generator. And now we have to define the validation generator and also the evaluation January. So how much will be different from the train generator? The main thing that will be different as the element that we will use. For the validation generator. I will use the validation augmented and flow from directory. And here I won't give it, I will give it the validation datapath. And the rest will be basically identical except for one thing. So let me just copy this here. And the one thing that we will change this value, we don't want to shuffle our data during the validation phase because it doesn't mean much in this case for training, okay, we want as much randomness as possible. We want the neural network to see the cases in a non monotonic way. Let's call it like that. So, but for the validation part, we don't actually need that. So we will just set the shuffle parameter to false because the shuffle parameter, what it does is that takes the images, it shuffles them every time it's generating, so every new batch, it will be shuffled. But the validation part and the evaluation part, we don't actually need needless. So let me just define the validation generator. And it will be the same as the validation generator. So here I'll be using the same augmented flow from directory. And the parameters will also be the same, except of course, for the that of the data here we will use the evaluation datapath. And now that we have our generators, our function needs to return all these generators. So the train generator, validation generator, and finally the evaluation generator. So this function will help us take our images, apply some transformations on them, and give us the output new augmented images. And this will help us during training. 17. Putting everything together inside a train function: And now let's put everything together inside a function that will actually do the training. So the function will take the generators to generate the data and it will take or create a model using the function that we defined before. And then it will put everything together and the run the training on our data set. So for this, let's call our function train for example. And the goal here is to give it the bad to our data. And also we're gonna give it the batch size and the number of epochs. And here we'll define everything that we need so that the training will go smoothly. So the path to data is the path to where all the folders of training, validation, and evaluation are. So for us, it will be n this folder. So in this folder here, you can see we have three folders and each folder will have the folders of the classes of images. And inside those folders, you have images of course. So our path will, or the path that we will give to our function will lead to this specific directory. So we actually need to construct the bats to the training data. So basically here, in order to do this, we just need to use the OS dot bath that join and Lemmy import the OS module here. Import OS. And here, what we will do is give it the path to the data. And because it's the training data, we will choose the training folder here. And the same thing we will do to the validation data. So here, whereas the bath that join that to our data and we're gonna give it the validation folder. And finally, the evaluation data. And can give it the path to data. And here, evaluation. So now we have constructed the bats to those specific folders. So after this, what we actually need to do is to create those generators using the function that we defined before here. So let me just copy this because I know that this is what will be returned. So I'm going to paste it here and I will call the build data pipelines function. And what I will give it as the same parameters. So we have the batch size and then the bats. So for the batch size. I will give it the same, or I'll give it the parameter that we have here. So bad size. The paths or the train datapath is the bed to train data. The validation data path will be the path to validation data. And finally, the evaluation datapath will be bad too all data. So now we have our generators. And what we need to do now is to create our model, our architecture of the neural network. So we will use the function that we defined before. And for this function it takes the number of classes as a parameter. For now we're going to set them by ourselves so we know that we have 11 classes, so we're going to use 11 here, but later we will be actually automating all of this so that if you have a different data set and different images, maybe let's say you have the data set that only contains three or four different types of, of whatever your data has. Maybe you have a data set of animals, let's say dogs, cats and horses. So we want this code to automatically detect that there are three classes. But for now, let's just use this because we want to make sure that the parts of, of the code that we have created before are working correctly. So we have the build model and now let's creates the optimizer that we will use during training. And for this, I'll use the Adam optimizer. If you remember, I have imported to different optimizers, I will be using this, but I encourage you to try this as well, the stochastic gradient descent and just see if you get better results or if you get a quicker, maybe a conversions of your function. So here for the Adam, Adam optimizer, I will give it the learning rate here of maybe one to the power or ten to the power of minus five. So small learning rate. Of course, you can play with this as well, change the parameter here and see how that affects your training. And finally, our compile the model. Of course the loss function I will be using the categorical cross entropy function. And basically this loss function, we use it because our specific task is for classification and all our classes are independent from each other. So this is the correct loss function that we need to use. If. If you have the same task, just different classes, then you can still use the same loss function. But if you have a task where maybe there, there is, you have images or you have classes there are that could collapse with each other. So for example, let's say, as I mentioned before, you have images that could be both in the rice class and the meat class. So in this case, the loss function will change. For us. We will use the cross categorical cross entropy. And also for our optimizer, we will use the optimizer that we have defined before. And finally, the metrics parameter here allows us to monitor certain parameters during training. So for me, I will be monitoring the accuracy. Of course, you, you automatically get the loss values, but this what you define here as an addition. So we will be monitoring both the loss function and the accuracy. So finally, we just need to run the fit generator function. And I just want to mention something. I've seen that maybe the function will be removed in the future. And apparently, you can start using the fit function and you can give it generators as parameters. For me for this, for this version of TensorFlow, which is version 2.2, this function still works. So I prefer to use this here because it, it shows in a much clearer manner that we are using generators and not the conventional fit method. So I prefer to leave it here, of course for you, or you can change it. You can look at the documentation and use the fit function with the generators. So here I will give it the train generator, I believe that's how I call it. So yeah, trained generator. And I will give it also as a parameter. The steps are here. So basically we need this parameter because our generators just generate images. They don't know when to stop. And also the model will not know whether it has actually gone through the data set at least once. So this parameter here, what it, what it actually does is that it tells the model that you have actually finished one epoch. And by epoch, I mean, we have finished going through the data set once. So here, of course, the best manner to do this is to define the total. A train images and divide them by the batch size and only get the the value without the decimal. So the integer value of D division. So when we use this, this operator here, it actually does that. So for example, if it is five divided by two, then is gonna give us two. That's the result. So here, the steps per epoch, we will use it to tell our model that it has actually finished. One epoch will give also the validation data. And we're gonna give it the validation generator as the parameter. And also the validation steps. We need to do the same thing as here. But for now, what I think we should do is we will just choose a value here, we hard-coded, but this will change in the future. Of course, we will automate this. But for now, I just wanna make sure that the code is working. The code for creating generators and for building the model. I want to make sure that everything is working fine. And after I make sure of this, I will go back and change these values and put them in an automated way so that if we change the number of images inside the folders, this will change automatically. So here I will suppose that it's around 3,500 images divided by d batch size. And finally, I'm going to define the parameter E box and I will give it the parameter epochs here as a parameter. So this epoxy just tells the model that how many times it needs to go through the data set. So for now we have finished. And what I would like to do now is just to verify this, this part of the code and see if it's working correctly or not. So let me just call the trainer dot pi. Sorry, I forgot. Actually, if we run the script like this, it won't run anything. So here we have, because we haven't given it any parameters, we didn't call the functions. So here let me just verify that we are actually running the script. So we'll do the same thing as we have done in the other file data handler. And here we want the bat to data. And for the path to data, I'm just gonna copy this from here. And I'll put it here. So we need the path to data and then we can call our will our function. And for our function, we need to give it the path to data batch size and epochs. So bats to data. The batch size, we're gonna give it maybe best size of two because I'm running this on my computer, so I'm not sure if it can handle more images for the batch size. And of course this is just for testing. Later we will be automating everything and also we will be running the training on Google Cloud Platform. So don't worry about this for now. This is just to test whether the code is working correctly or not. And I do encourage you to do this when you are coding and when you are working with this kind of projects. Every time you add something, that's your code because sometimes just small pieces of code can break. What's break the logic. And if you create maybe bigger and more sophisticated functions before you start testing them, then it will be harder for you too. Maybe debug and find errors if there are. So here, let me just call the Python trainer dot pi and we have an error image data from. Okay, let me just check this. I think this is not this image data generator is not in this file here. I think it's the image file. So let's test again. And this is why it's good to test things quickly. I think the code is running now and okay, yeah, the training has started. As you can see, we have the loss and we have the accuracy. Seemed that it's, the code is working. So for me I don't want to wait for this part to finish or for the training to finish. I just wanted to make sure that everything is working correctly. So now our stop the training and we will go back and see where we can improve things, automate things, and so on and so forth. 18. Improving and cleaning the code for robustness and automation: And now the start changing some barbs of our code in order to make it more robust and also to automate the process of the training. So we're gonna start by the number of classes here. We have hardcoded it to 11. In some cases this, okay, and you can just leave it like this. But for me, what I always like to do is to automate things and so that I don't forget if something happens maybe in the future, if I want to use my code on a different data set, I want to make as minimum as possible of changes on my code for that new task. So this is why I like to automate things from the start. And this is what we will be doing here for the number of classes. One thing we can do in order to get the number automatically, actually just by using the generator that we had before so that the train generator actually has a parameter here called class indices. And this is actually a dictionary that contains the classes or the names of the classes and also the, the corresponding label that comes with those classes. So let's store this in a value here. Classes. For example, for dictionary. And because this is a dictionary here, what we can do is to get the land of the classes that keys. So this will be a list and, and the length of this list will give us the number of classes. Because here, these classes or this, this dictionary here will have as keys the names of the folders, so bread, dairy products, and so on and so forth. And for values it will have the labels. So, and all these, these values of or the names of our folders will be ordered alphabetically. So this specific part, we're gonna just automate it easily using this approach here. And now what we need to do is to automatically get the number of images for training and also for validation. So in order to do this, let's create a function that will help us achieve this specific scenario here, or a specific goal. So here, let me call this function, for example, get number of images. Images inside folder. I usually prefer to have the function describe what they do even if they are along, such, such as this function here. Of course, you can change the name and use a different name depending on how you usually do these things. So what I want my function to do is to take a directory, directory. What I wanted to do is to go through that directory and compute and calculate all the images and give me that total number of images. So let me just call it total count here. This is my counter and I will initialize it with 0. And then I would like to use the word function from the OS package. So we have used it before. So let me just use this same thing here as before. Filenames in West dot woke directory. And then I want to go for in, I want to go through all the filenames here. So for each file name, in file names here, what I would like to do is two. This is another approach where I would like to get the extension. So here, what if I use the function split text? I can get the extension of the file name and then I can use it in order to determine whether it's an image or not. If it's an image, then I will increment my counter. So I will get the extension here. How splits the texts filename. So this function will give me the extension of the file name. And then I will just verify if my extension is an, is an extension of a file that represents an image. So I will verify the extension as dot PNG for example, or JPEG. So of course for me I know that, Oh, my images have the extension of JPEG. But as I said, I'd like to automate things as much as possible. So maybe in the future I will have new images that have the PNG extension or JPEG extension. So i want to be able to verify this in this function. And this will help me in the future if I ever need different types of images. So here, the total count, I will increment it by one whenever I have an image. And that's it. Now I only need to return the total count. And now this function, I'll give it. Each time I give it the directory, it's gonna give me the number of images inside that directory. And what I would like to do now is to change these values in such a way that they are automatically computed. And automatically I get the number of images into training folder, for example, right here, and for the validation folder right here. So let me just maybe a little cold them here. So total number for total images. I usually like to give variables and functions. Good name. So total train images. I will get the number of images inside the folder and I will give it the path to train data. I can do the same for the validation set and for the evaluation sets. So here, get number of images here. I'll give it a deep bath to validation data. And the total evaluation images. Get number of images inside a folder and I'll give it back to evaluation data. So this number should represent the number of images inside the training data. And the other one's represents the number of images in validation and evaluation folders respectively. So here, total train images will be used here, and total validation images will be used here. So for now, I want to test the code and see if my function works correctly or not. So let's call the script here and see what did give us. Yes, what does training is? Starting? That means the function at least is not giving us an error, but just to make sure let me bring these values and verify them here. So bring total train images. And we stopped at training and go back and see. Actually we already get a get the, we get the numbers of images here, but our function, as we can see, it's correctly, uh, giving us the numbers of images and Dos folders. So the first number represents the number of images into training folder and as we can see, is identical with this one for the validation, the same thing, for the evaluation, the same thing. So our function is correctly computing the number of images in those folders. And now we have, in my opinion, a much cleaner code and a better, robust and automated way to run the training. So for example, in the future if I remove some images from my folders or if I add new images to my folders, the code will run correctly and it will take these changes into consideration. 19. Launching training locally on a subset of our data: Another thing that I encourage you to do is to run the training on the subset of your data and let it run for two or more epochs. Just to verify that everything is working correctly. Sometimes at the end of the epoch, by going to the next epoch, you might find some errors there. And also sometimes your loss function is not going down. Maybe it's diverging. By diverging, I mean it's going up. So these, in these cases, you would preferably would like to do this testing or running the code on a subset of your data just to verify these things. Because for our case we have a large data set. So if I let it run for just two epochs is gonna take a long time. So what I have done here is actually I have created a folder called dummy, and I copied all these folders inside of it. And then I went through the folders and I removed most of the images and only left a few images, as you can see here, I only have like 44 images in this folder. And the same thing goes for the other folders. I only left a few images, and I want to run my code on these images on this subset of our data set. And just to see whether the training goes smoothly, if I run it on two or three epochs, for example. So the only thing that will change in our code in order to test this, to change the fat to our data set. So here we are going to point to the subset of the data. And for the batch size I'm done, leave it the same for the epochs you can use to, I'm going to use three here just for testing purposes. And let's run the code and see whether it goes through the different parts or the different epochs without any errors. This is also something that's highly encouraged because you don't want to start coding and changing things. And and then you test on the whole data set. Because every time you do this is gonna take time and you will be wasting a lot of time, especially if there are errors. You just want to fix them as quickly as possible so you want the test to be as quick as possible. So here, as you can see, I'm only running the training on 207 images. And I can notice that the loss function or the loss values are going down. I noticed that the accuracy is going up. And let's see if it goes to the next epoch without any problems. I'm waiting for the next epoch start. Okay, now it started at also did the evaluation or the validation part on our validation data set. So here by noticing this this pattern of the loss function going down and the accuracy is going up. I can know that my pipeline is correct and that my training is working as it's supposed to be doing. So. By, by doing this small adjustment, by testing on a subset of our data, we can verify things quickly and do the testing quickly. Make changes, tests, make changes tests, which is usually the job of a machine-learning engineer, could because once you've finished coding, even if everything is working correctly, you have to start testing things and changing and testing again. So this process can take a long time. A few are testing every time on the whole data set. So now that I have verified that the training is working correctly, let's see what we can do next. 20. Adding evaluation at the end of training: And now that we know that the training is working correctly, and we have verified that, let's do. Now what you usually do after the training is done. So after the training is done, usually what we do is do an evaluation phase or a testing phase. The same, they're both naming things, mean the same thing. So what we do is basically we run the trained model on the evaluation data set. And we get some metrics to be able to evaluate whether the model was trained correctly or not. Whether the, whether the model has learned to distinguish, for example, between the different categories of food in our case. So in order to do this, we will be using some metrics from the scikit-learn library. And I have important these two functions here, classification report and confusion matrix. So the classification report will take as input the true labels and the predicted labels. And it's gonna give us the sum matrix, for example, the accuracy and score, recall and things like that. And for the confusion matrix is gonna give us a matrix where we can see how the predictions in one class are compared to the force predictions of the same class. But in the rest of the classes, this will be much more clear after we actually plot or after we print the confusion matrix of the evaluation phase. So I just imported these two functions here. And now I will be going at the end of our training function. And here we will be doing the prediction part. So for the prediction part, maybe let's, let's print some, some, some line here that tells us that we are in the prediction or we are in the evaluation part. So a convention is to use the info, info word inside these brackets so that we can immediately see it in the logging. So here in the info, I would like to rights that we are in the evaluation, evaluation phase. And here we will be doing the predictions. So first of all, I will be getting the predictions. And of course, I'll be running the predictions using my model on the the generator that we have defined for evaluation. So evolved generator. And actually in a previous, in previous versions of carriers, you have to define to define a parameter called steps in here, but with the version that we have now, so I am using TensorFlow 2.2. So here you don't have to define that parameter steps is computed automatically. So here we get the predictions. And after we get the predictions, we want to get the indexes of the correct predictions. So what we do is we just store this in variable called Predictions indexes. And here what I would like to get as the max, I'll be using the function argmax from NumPy array, which can I give is gonna give us a, a list or an array that contains the correct class that was predicted from the generator that we just run. So each time's gonna go through all the rows and look at the predictions. And in each line we will have a set of 11 predictions, so 11 probabilities, and we will get the index of the highest probability, and that would be the index of the class that was predicted. So here I will give it my predictions. And the axis would be one. And then I can just store my classification report in a variable. And I will do this after calling the classification reports function. And I will give it the value, the evaluation generator classes as input. So what I am giving it here is actually the true values or the true labels of all the examples that we have in the evaluation generator. So all the images, they have, each image has its own label, so it's own category of food. And this variable here, classes will contain the correct index of those images. So let's suppose one image is of type bread. So in this case, the value he will be 0 because 0 corresponds to the class bread. But of course this will be a vector that contains all the correct labels of those images. And I will give it my predictions indexes as well because it needs both of them to compute this classification report. And finally, I will give it the the names of the classes. And the Parameter called target, target names. And in order to get the names of the classes, I can just do the same thing as we have done before. So I will use the generator and I will get the class indices. And here I would like to get the keys because they represented the names of our classes. Let me just format the code here in a manner that makes it easy to read. So now we have the classification report which will contain metrics about the evaluation phase. And we will bring this at the end to see what it gives us. And apart from the classification report, I would like also to print the confusion matrix because it helps a lot to compare the predictions in one class compared to the other classes. So here I will create a variable called my confusion matrix. And I will use the function that we have imported. And the in the same manner. I need to give it the true values. So I will be using, sorry, here. So I'll be using the evaluation generator dot classes, just like before. And I will give it the predictions, so predictions indexes. So now I have the sum of the metrics from the classification or from the training. And after the training was done, we get some metrics on the evaluation phase and also we get the confusion matrix. There is going to tell us a lot about how our model has performed. So here at the end, I would like to print again in the same manner. I'll be using the brackets and write info inside. And I would like to see here the classification report. And I would like to print this in a different line just for easy reading. And here I will call my classification report. In the same way I would like to print the confusion matrix. So now at the end of the training, we will start the evaluation phase. We will compute the predictions. We get the indexes of the highest probabilities. So this will store our predicted classes and this will contain the correct classes. So the ground truth. You might also hear this this expression and which means the correct labels that we get from annotating the data by ourselves. And in our case, because we have puts each image in the correct folder, then we know that each image belongs to a specific class. So this is the correct real ground truth. And this is the prediction or the estimation using our trained model. So now that we have used this, let me just verify I have num pi, okay, i have important num pi here because I'll be using it in and P dot arg max here. So apart from this, I think everything should be fine here. Let me use maybe. Okay, let's just keep it one actually, because we just want to see whether the code is working correctly or not. And the training starts, or at least the inputs import of TensorFlow. Okay, now the training starts correctly. So now that's the evaluation phase has finished. Let's see what we have here. So for declassification report where we get as the precision or the accuracy, so-called precision. We have the recall, we have the F1 score, and we have support. So all of these are values that you can use to evaluate your model. After it's finished. It finishes the training part. And for us, we are mostly concerned with this column here for the precision. So for each class we can see how accurate the trained model is. Of course, in different situations you might need the recall or the F1 score or the support. But here we will only be needing the precision. I wanted to show you this function here, which works so so that you can have more options. If maybe you want to, to evaluate your model based on two different metrics, may be this function will be helpful to you. And as you can see, we get the precision for each class and then we can get the final accuracy here. And we also get the macro average and the weighted average for, for the evaluation phase and for the confusion matrix, this is what I have described before. So here you can just imagine your predicted classes here. And you can, you can imagine the labels here, for example, bread, dairy products, eggs, so on and so forth. And the same thing on this side here. So this, for example, this first line, what it means is that 22 images of bread were actually predicted as bread, but two of them were predicted as this third class, and one was predicted as discourse class, and ten as this fifth class. So this confusion matrix gives us a lot of information and it tells us where our model is making mistakes and it's making mistakes compared to what exactly. And we can do this for each of our classes. So this is why I like to analyze the trained models using the confusion matrix, not just using the precision. And I hope that now you can see the benefits of all of these metrics and you understand why we might need them. 21. Summary: So we have been able to do so far is to organize our data sets. And then we created pipelines to read that data and pass it to the model during training. And at the end of the training, we can do evaluation using the data for evaluation. But the thing that is missing now is that we are reading data from our local disk. So the end goal is not to read data from the local disk because we will be doing the training on the cloud, on Google Cloud Platform. So our data set needs to be somewhere that we can access it from the Google Cloud Platform. And if we have our data on our local disk, then that would be there would be impossible and that would be not recommended. Actually, that's not how you run training on the cloud. You don't put your data on your local disk and you're undertraining on the cloud. The solution to this is to put the data somewhere that we can access from anywhere. And that's where Google Storage and Google buckets come in place. We will be using this service from Google. And I will be showing you how we can do that in the next videos. What we will be doing as we will put our data on the cloud, on Google storage inside some buckets that we create. Then we will still run the training locally. But instead of reading the data from directly from the disk, we will actually look at the buckets that we have created and we download the data from those bucket to the local disk. And then we run the training. So that's what's gonna change. And this is what I'll be showing you in the next videos. 22. Our different setups for reading data during the training: Welcome to this new section of the course on learning how to train and deploy deep learning models on Google Cloud Platform. So as I have mentioned in the last section, at the end, I mentioned that now what we will be doing is putting our data on Google Cloud Storage inside buckets. So I just want to give you an overview of what we have been doing so far and then what we will be doing from now on. So, so far we have been working with this setup. We have data on our local disk and the training is also happening on our local machine. So all that we are doing is reading the data from the local disk and passing it to the model during training. As I said, when we move to the Google Cloud platform, we need to put our data in a place that's easily accessible. So for that we will be using Cloud Storage from Google. So this is the setup that we will be using. From now on. We will have data on our Google bucket. And the training at first will still be on our local machine. But now, in order to run the training, we need to first download the data and then start reading it from the local machine. So there are some steps that are added here. So first we have to put our data on Google Cloud Storage. And then in our code we need to make sure that we are downloading the data from the cloud to our local machine before we start training. And, and gold. And what we would like to have by the end of the course is this third setup. So this third setup, what we want to have is data on google bucket. So we, we are still using storage from Google Cloud and we are downloading and reading the data. But instead of doing this on our local machine, we will be running the training on Google Cloud AI that for. So the AI platform gives us access to some machine on Google Cloud. And what we would like to have, as in that machine on the cloud, we want to be able to read data from the Google bucket, download it to our machine on the cloud, and then read the data from the machine that's on the cloud. So this is the end goal and this is what we want to have by the end of the course. But for now we will be working with the second setup. I usually like to divide my process into different sub-processes. And I like to validate things before I go to the end goals. So as you can see here, between the first setup and the last setup, we have changed mainly two things. And I like to do this by separating those those two things into two different setups. So the first setup, we will have data on Google Cloud Storage, but who will still be doing training on our local machine. So if we do this, we can validate that we are correctly reading data from Google Cloud bucket. But once we finish this, once we validate this setup, we will go to and set up our end goal, which is training on Google Cloud Platform and the data also on the Google Cloud buckets. So I hope you now understand what we are trying to do and you understand the setup that we want to put in place. And I'll see you in the next video. 23. What are buckets and how to create them: So what is Google Cloud Storage anyway? So Google Cloud Storage, you can think of it as your local file system and your local disk where you are putting data on your desk and then maybe reading it. Maybe you have some files, maybe you have some videos, maybe have some music. So Google Cloud Storage is basically almost like that, but it's on the cloud. But in fact, there is some, some fundamental difference between your local file system and your Google Cloud Storage. So for now we're not going to go deep into the differences. You can think of them on a high level as the same, but you should know that on the low level and how these two systems are constructed, they are different. So let's start by creating a Google Cloud bucket. And in order to do this, you can just, you can just go to your console. Or if you don't remember, you can just typing google Cloud Console. And then you can access it here. By the first result. Actually, if it gives you this page, you can just go to counsel or click here as I have shown you before. And once we are on our console here, I would like to start by creating a new project. As you can see here, I have some, some elements on my dashboard for my last Google project or my last project on the Google Cloud platform, which I named DL on-air platform tests. I was doing some tests here. So this is showing me some elements that I have created and some services that I have used in my last project. If it's the first time that you are using Google Cloud Platform, then you might not have any projects here. But what we will be doing a starting from scratch. So I will be creating a new project. In order to do this, you can either go here, click here, we will get a list of your projects. Even if you don't have any projects, you might see one project that was created for you just as a default projects. But I recommend that you don't use that you create a new project just so that you can learn how to do this from scratch. But of course, you can still start with the default project that you are given. So here I will create a new project and I will give it another name. So this is the default name for me. I would like to give it a name of maybe train, train, deep learning. Models, also you can call it train a classification model. As you prefer. For me, our, our call it trained deep learning models. And I will leave the organization as it is. I have no organization to link this project with. And now I will just create this project. It might take a little time to create and set up the projects so you can wait. For me. It didn't take more than a few seconds. And now that we have our project, you need to make sure that you are on that project. So for me, as you can see, the old project is still activated by seeing this sign here. But now I will click on my new project. Make sure that the name of your new projects as shown on the top. And once you have created your project, you can just go to the search bar and you can search, for example, bucket. And you will get these results. You can just click on Create bucket. And now we are in this specific page where we can give our buckets and name and choose a region and so on and so forth. So for my bucket name, I will call it, maybe I'll call it food data bucket. And I usually like to end my buckets names by the actual string bucket so that I can know that what I am accessing is a bucket. Of course you are free to change that and take and just write food data or, or whatever name you like. We'll click Continue. And he, we will choose where to store our data. So as you can see, we have three different possibilities, two different options. There is one region, there's dual region, and there's multi-region. I usually use multi-region, but for this specific projects, I will use only. And for the location I am currently in France, so I will look for Europe regions here. But of course, if you are in the US or North America, you can choose one of these options. Just make sure to remember this option later. So for me, I will choose maybe the western side of Europe. So maybe I'll just choose Frankfurt. And then click continue. Now we have different data accessing options. So you have the standard nearline and coldline and archive, and we have some descriptions on what each of these options do. So for us, we will go with standard because as it's shown here, is the best for short-term storage and frequently accessed data. And we will be frequently accessing our data during training. So I prefer to choose this option here, standard, click continue. And here we actually can specify the type of access that we want to our bucket. So as you can see, you have two options. The first option is fine-grained. So here you can actually give access to specific objects in your Google buckets. So maybe on the same bucket you have some data that you want to allow access to everyone and some other data that you don't want to allow access to everyone. Maybe you only want to allow access for specific people. But for us, we don't really fall in this category here. We just want a uniform access to our Google Cloud bucket. So I will choose the uniform option here. I click continue. And then in the advanced settings, you have the ability to create your custom customer customer managed keys. For me, I will just keep the Google managed keys because I won't have to do any configuration. And for this specific project, I would like to go with this option. So that would be it. And then we, we click on Create. And now we have our bucket that was created for us. And you can actually upload files, folders, create a folder, then upload files. You are free to do whatever you like on this Google bucket. So in the next video, I'll be showing you how to upload data and then how to download it and read it on in our code. 24. Uploading our data to the bucket: And now let's upload our data to the bucket that we just created. So for this, on the, on the page where the bucket description as shown here, we can just, we can just click on Upload folder. And for me, my data is here. And I would like to upload these three folders. Unfortunately, I can't upload them all of them at the same time. So I'll be doing this for each of the folders. And this might take some time to upload all the folders. So you can just let this upload. And they also depends on your internet connection and your internet speed. So I'll be doing this for these folders here. And then once once all of the uploads are finished, I will be back to show you what our bucket looks like. Then. Upload has finished. And now we can see that we have these three folders, the same that I have on my local disk. And of course, each folder contains the other folders of images. And inside of course you can see all the images are uploaded. And in fact, I have created another bucket. I'll show you here. Might take some seconds. Okay, I have created another bucket called dummy data bucket, which is basically a clone of the dummy folder that I have on my local disk. So if I go inside, I have a folder called dummy, and inside of it we have the same folders, except that these folders only contains a few images and now many images. And I have created this bucket or this dummy bucket. For the same reasons why I created the dummy local folder, which is to allow me to do testing as fast as possible. I don't want to test things on my data set, which is a big data set. Each time I make a change on my code, for example, I would like to be able to do this as quickly as possible. So this is why I created this bucket which contains only a small subset of my data set. And I will be doing all the testing and the iterations using this small, small data set here. And after, I have made the whole pipeline clear and I have tested everything and I know that everything is working fine. Only then I will change to the bucket where that contains the wheel and big datasets that we have downloaded in the beginning. So I wanted to show you that so that you can use this in your pipelines as well. Always make. You can always use this as a technique to accelerate your testing and your iterations. So always work with this small subset of your data. And once you are sure that everything's working fine, only then switch to the real datasets. 25. Creating a credentials json file to allow access to our bucket: So as I have mentioned before, we are in this situation or we were in the situation where we had data on our local disk and the training was also happening on the local machine. So we were just reading data from the local disk. And now we have moved this setup here where the data is on the Google buckets and training is still happening on the local machine. And now what we need to do is to first download the data and then starts within it from the local disk. In order to do this, we actually need something very specific, which we need credentials to allow us to access the bucket. So in order for us to access the bucket from anywhere, we actually cannot do this unless we have some credentials. You can think of it as a way to tell Google cloud platform that we actually have the right to see what's inside this bucket. We can download what's, what's inside of it. We can upload to which you can modify it. So we need some sort of a way to allow us to do. And in fact, one way to do this, which is an easy way that I think will be very useful for us in this situation is by using what's called service accounts. So service accounts are, you can think of it as an ID and a secret key that allows us to access these buckets. So let's go ahead and go to the Google Cloud Platform. And let's create a service account to be able to access our buckets from anywhere. So I am back in the council. And as you can see, OK, we have these two buckets that we have created. And as I have mentioned before, we need some sort of a way to be able to access these buckets. In order to do this, we will be using service accounts. So let's look for them here. And as you can see, if you type service accounts in the search bar, you get this results here. If you click it, it's gonna take us to the IM and admin. Iam stands for identity and access management. And inside we can go to this service accounts here. And in this page we can actually create a service account by going to this button here. We can give our service accounts and name. So usually I like to keep the same name as the project here. So deep learning or training deep learning models. Service accounts. Feel free to change the name of the service accounts. For me. I will be using this. And let's just click Create. And now we need to give permissions here. You can actually go ahead without giving permissions, but for me I like to give him the permission of an owner. So this means that I can do any action that I want in this bucket, including uploading and downloading files. Let's click continue. And here I don't want to add anything in here. So I'll just click Done. And now, as you can see, we have created a service account, but we have no idea. In order to create key ID. We actually just go here and maybe let me, let me just first explain to you what we are trying to do. So now we have the service account that allows us to access our bucket from anywhere. And when we create a key ID, we will create a JSON file that contains an ID and a secret key that allows us to access these buckets from anywhere. So let me just click on Create. I'll choose JSON, and let's just click on Create. And as you can see, it has been downloaded to my local machine. If we take a look at this file here, you can see that, okay, you have the type service account, you have project id. We have a private key here. So we also have a client ID. We have all these information that will allow us to access our buckets from anywhere. You actually should not. With other people, otherwise, they will be able to access your resources and your data on your buckets. For me, I am showing this, but I will be deleting everything by the end of this course is bucket will not exist by the end of the course. So anyway, now you have an idea of what you need in order to access the buckets. And we will actually use this file in order to create code that can access the buckets. And we will be doing this in the next videos. 26. Problem with our credentials file and how to fix it: So we have now the file JSON file that allows us to access our bucket on Google Cloud Platform. And we will be using this to, to download data to our local directory. So in order to use that credentials file in our code, let's first add a folder to our directory here, and let's call it, for example, credentials. And inside these credentials folder, I will be adding the JSON file that we downloaded. So for me I will just copy it from here. And I'll add it to the credentials folder. And I should see it here. So now I have the credentials file in my local directory and I can access it in my code, and I can use it to be able to access the buckets. So what I will be doing is adding some functions here that will, for example, do downloading of the data and maybe later uploading. And I have chosen to add it to the data handler No.5 pile instead of the trainer that I would like to train No.5 file only for training codes. So anything that's related directly to training, I'll be adding here and data handling parts. I will be adding them in the data handler that's by file. So let me first import a some packages that I will need in my, in my code for downloading the data. So I start with google Cloud, and you should have already installed in your virtual environment if you have used the requirements.txt file that I have given you before. So we should see here, Google storage, as you can see, it's already in the requirements that by EXE file. So it should be already installed in your in your local virtual environments. If you didn't use the requirements.txt file, then you might not have this. So you need to pip install this requirement here. So let's go back to the data handler does by file. And we have imported storage function here. And we will use this function at first just to list the different files that exist on the bucket. This could be a start for for testing that our credentials file is correct. Our buckets are accessible from our code. So here I start by defining a function called list blobs. And blob is basically the objects, all the files that exist. The bucket. And they are usually called blobs in the Google Cloud Platform naming conventions. So that's why I'm calling them blebs as well. And we will give it the bucket name. And then what we will do is create a client. I'm going to call it Storage client. And I will be using the storage function that I have important, important. And I will be importing from a client class. And from the clients I can do from service account JSON. And here we can give it the path to our credentials. So I'm gonna give it here bad credentials, which actually does not exist yet. We will create this. And we have the possibility to just add it as a parameter here. But for the credentials, I prefer to keep it as a global variable in my code so that I can access it from anywhere. So that's why I will take this variable here. And I will add it here by giving the path to my credentials file. So the credentials file is in here, so Credentials. And inside the credentials, I want to get the name of this file here. So we name a control. Let's put it here. So now the path to credentials as D-pad to our JSON file in our credentials folder. And that's that path will be used in the creation of our storage clients. And in here, we can actually just call a storage. Using the storage client, we can call the function list blobs, which are going to list all the objects inside the bucket. And we're gonna give it the bucket name. And here I can just return the blobs. So what this function will do is basically look at our bucket inside the Google cloud storage. And of course it's going to know which bucket using the bucket name. But in order to access the bucket name, we actually need to create a client, which actually you can think of it as almost as another user that has credentials to access the bucket on our Google Cloud platform. So this is how we create a client. Then we will use the client to list all the blobs. And let's use this. Function here by testing it on our, on our bucket. And let's do this in the next video. So let's change our code in here. And I will turn the switch for printing insights. So I will set this to false. And here I'll add another switch for listing blobs. Lists of blobs. Switch. I'll set it to true. And then here, I mean, maybe just hide these parts of the code because we will be needing them. And here I'll check if the list blobs switch is on, then I can call home. Maybe I can print the output of the list of blobs function and I will give it my bucket name. So my bucket name, if I remember correctly, it's true data. Okay, this is the name, the project scope storage, just to make sure that we are getting the correct name of the bucket. And here we see that, okay, we have two buckets here. And for now I'll be adding up using dummy food data bucket. So let me just copy the name here and use it here. And let's run the code and see what we can or what we will get using this code here. So if everything is fine, then it should go to all the files inside the bucket and it should list them here. In fact, the only thing that we are printing as blobs or the return of this function here. And this is why it's gonna tell us. It's telling us that this is an object. But what I really want to print is actually the names of the files inside my bucket. So this is why in order to do this, I'm going to save the blobs and a variable. And then I will go through each blob in my blobs. And I would like to print the lab dot name. So this is what I want to see in my console here, and this is the function. This is the goal of the function which is to list the names of the files and the names of the folders inside our bucket. So now lets clear the console and let's run the code and we will get an error here. And then I'll show you how to fix that. So as you can see, we have actually problem of access. Bucket. It's telling us that we do not have access to the Cloud Storage bucket. We do not have the right credentials. Basically, this is what it's telling us here. And we have actually created this service accounts with an owner role. But in fact, that's not enough in order to be able to upload and download data from the buckets, we actually need to add another row. And for this we're going to create another service accounts. And we will give it two different roles. We're gonna give it the role of an owner and also a role that will allow us allow this service account to access the bucket and upload and download data. So in order to do this, let's go back to our service account window in here. And what we will do is create a new service accounts. And let's say we call it train DL models. Let's leave it like this. The first one we call the train deep learning models. This one gonna call it trained deal models. And in fact, this is the one that we will be using in our project. So let's click on Create. And in this specific area here where we give permissions, this is where we need to give roles that allow the service account to access the bucket and manipulate data inside that bucket. So first let's create, let's add an owner role. This is a general permission or credentials possibilities. So when we say owner, we are able to do some sort of functions on the cloud, but it still does not allow us to manipulate objects inside our buckets. So for that we actually need to add another row. And for this, we're going to look for cloud storage or storage. So Cloud Storage and Cloud Storage. But I would like to get or what I would like to add as role is storage admin. With this role here, we will be able to do all types of data manipulation, downloading data, uploading data to the bucket without any problem. So let's click continue here. And let's click done. And now we have this new service account. And let's do the same thing as before. And this creates a JSON file that contains the credentials necessary to access the buckets and to access the Google Cloud Platform. So let's click on Create Key. Let's choose JSON as usual. And okay, let me close this. And let's search for the credentials file here. So let's go to download. Okay, this one is called as b3. B3. So this is the one that we downloaded. Okay, I'm gonna move this to credentials. And as you can see, I already have two because I did a simple tests before I recorded this video. So for us what we had before was this one, this one, I will remove this. This was just for testing purposes. And what we have as a new JSON file for credentials is this one. So let me just copy it's name. And now let's go back to our path to credentials. So this one, the first one we have is without storage admin role. And this new one that we will create is with an admin role. So of course I had this before because I had tested this before, this video. So let's comment this. And now we have this new credentials file. And it contains the necessary credentials to access the bucket. And not just Google Cloud Platform. So with this new paths to credentials, lets clear the council first and let's run our code again and see what we get. So as you can see now we actually have a list of all our objects inside the bucket. And of course, I am going through the dummy buckets here. So as you can see, we have a list of all the files, all the images that we have. And this is the goal of the function. And now we know that we can actually access the bucket and we can download data, which is what we actually want. So for now, I'm only listing the names of the files, but what we will be doing next is to actually download them using the same methodology, basically by using the client and service account. We will be using the storage client in order to do all sorts of things. So let's see that in the next video. 27. Adding code for downloading data from the bucket: And this create the function that will allow us to download data from the bucket to our local directory. So let's start by defining the function. So I'm going to call this function download data local directory. And this function will take as arguments two things is gonna take the bucket name so that we can access the right bucket. And then it will take a local directory name. So the local directory name is where we will be storing the data on our local disk. So let's start by doing the same thing that we did before. So we create a client and we will use the same thing as before. So storage dot client.print, service accounts, JSON, and I will give it to them. And then what we will do as also the same as before. So blebs Storage client list. And I'll give it the bucket name. So this is for d part of handling how we access the data on the bucket. And now let's verify that local directory exists. If it doesn't exist, we're going to create one. So unit. So we're gonna do, if not, is dear local directory. So what I'm saying here is if this local directory does not exist, then let's create this local directory. So now that we handle this, we can go through each blob in our loves. And what we would like to do now is to basically creates a path where these blobs are, where these objects, and when I say objects here, I mean folders and files that are inside the bucket. So we want to create a fad, a local path to store those objects locally. So here I'm going to call this joint bath. And I will join two pads. So the local directory and also the blob dot name. So as I have shown you before, the blob dot name would look something like this. So this is, for example, the blob name of this image here. So what we will have the local directory, and then we will have another folder inside the local directory called dummy. And then inside of it we will have a folder called validation. Inside of that folder we will have a folder called vegetable. And inside that folder we will have this image. And this will be the same for all objects here. So now that we have defined our joint path, what we will do is verify whether disjoint path is a path of an image or a folder. So as we can see, in this case, this is a path to an image. If it's a path to a folder, we will have only the name of the folders and then we will have nothing here. So in order to verify this, what we can do is verify whether the base name of the joint path. If it's empty. If it is empty, this means that we don't actually have this part here, we will only bad here. So if, if this is true, then this means that we are actually having a blob of an object which is a folder and not an image. So if we do that, we will verify whether this is a directory. So why I am doing this here? Why am I verifying that it's a directory or not? This is only for the second, third and so on and so on and so forth. Times the first time we will create the folder. So 00 as.matrix joint bat, but in the next time. So let's suppose you run the same code multiple times. If you have already downloaded the data and if you have already created those bats, you don't actually need to do that again. So this is like a small optimization. We don't want to recreate folders that have already been created. So now that we are making sure that we don't verify that we don't create these folders. Again, what we can do is verify the second case, which is if the object that we are looking at is actually an image. So basically if this is not empty, this part here is not empty. That means that we have an image here. And for that, I will just verified that this is actually a file. So again, I am doing this in order to not do it multiple times. If, for example, we downloaded the data the first time we run our code. And let's say for some reason, for example, we make changes to the code and we run it again. We don't want to download the data again because that's just going to take more time and the data is already on the local disk. So we are verifying whether this file exists. So this image is already on the local disk, and if it's not, then we get a download. This blob or this image file name. So this is a function or a method that we can call on these blobs objects. And then we are downloading these blobs or these objects to a filename. Filename as our joint path. So this should be it for our function. Now if we call our function, it should download the data to our local directory. And local directory. I'm going to give it as a second argument here. So let me just change this part of the code. I don't, I no longer want to list the blobs, so I'm going to put this to false and download data switch and this switch that I am creating. So if this switch is on or if it's true. So I'm going to verify this download data switch is true. In this case, I would like to call my function. So download data to local directory here. And first parameter is the bucket name. So in our case, the bucket name is called dummy data bucket. And the local directory. I personally prefer to put it right here, put the data here in a folder called data. So I will use this as a bat. So this data folder here, and I would like to have all these objects inside a folder called data, and this folder should be here. I will not create the folder here because in my function, I already verify whether fat exists or not. If it doesn't exist, then we will create it. So if our function is working properly, then when we run the code, we should see a new folder called data in here. And inside that we will have all these folders and images. So let me clear the console here. And I'm gonna call my data handler that py file. And let's see what we get here. Although we have correctly created the data folder, we are having this error here, which says that no such file or directory, although we are trying to actually create this directory. And the problem arises in this part of the code. So in this case, the name of the object on the bucket ends with a slash. For example, it doesn't have any images inside. For example, this is just a path to a folder inside our bucket. Then, okay, we will verify this in here and we'll create that folder. But the problem comes in here. If we verify or if we try to download that image. But as we have seen before, the image is inside the folder, which is inside another folder, which is inside another folder. If we tried to just download an image to that file name, is not going to find all those folders that precede the image. So all that's bad. That comes before the image. It will not find it and we have not added any code to create it. So in order to overcome this, what we need to do here is to verify, actually that's not the joint path, but the OS dot that's near Name of the joint Pat. So what this will give us the path to all the folders that come before the image. So in our case, as I have shown you before, it's the dummy. Then we have, for example, validation, then bred than inside of it we will have an image. So all of these folders, dummy validation bread, all of these, we will be verifying whether they exist in this part of the code. And if they don't exist, we will create them. So make beers and we will create them using the same code in here. So what we are doing in this part of the code is just creating the folders that will be inside this data folder. If they don't exist, we will create them. And so we are verifying whether the file does not exist. If it doesn't exist, we will verify whether the folders in which this fine exist, whether they themselves exist or not. If they don't exist, we will create them. And once we create these folders, we can just download the data to the file name. So this should fix this error here. And lets clear the console. And let's run the code again and see what we get. Unfortunately, I didn't add any printing here to see whether things are being downloaded or not, but we can just verify in our folder here data. So as you can see, it continues to download data and it's putting them in here. And they are, for example, this data dummy evaluation is downloading all the folders inside the evaluation folder at all the images that are inside these folders. So this process takes some time. But the good thing about this process is that it takes some time at the beginning, but once the data is downloaded to the local machine, the training will go smoothly because we will not have to worry about connection problems. Everything will be happening on the same machine. So when the download is completed. Come back here and show you what we have. We have just finished downloading the data. It took maybe around three to four minutes to download the data. And remember this is only a subset of our data set. So if we were to download the data set, then this would have taken a longer time. So this is why it's good to have a subset to allow us to iterate quickly. So as you can see, what we have now is a folder called data. And inside of it, we have the same structure as the bucket. We have a folder called dummy. And inside that folder, we have an evaluation folder, training folder, validation folder. And inside each of these three folders, we have 11 folders that represent the different categories of our food. And inside these folders we have the images that were downloaded. Example, verify one image. As you can see, we can see it clearly here it was correctly downloaded. So now we have D correct part of the pipeline for handling the data. So what we have now is downloading the data to the local directory. And once it's downloaded, we can actually use the same methodology that we used before, because now we will be reading data from the local machine during training. So this is one method of doing things. You might also be asking yourself why not leave the data in the Google bucket and read from the buckets and during training, which is also a widely used method. At the end, it's really a matter of preference and also a matter of how big your data set is. I personally prefer this method, especially for projects that I want to finish quickly. Which is the point of this course, is to allow you to quickly build a model, train it, and deploy it. So I'm using this method because it will, I will avoid doing so many mistakes when, when I make sure that I download the data to the local directory under machine, and then I start doing the training. If I do this, I can make sure that parts of the pipeline that handles the data is working properly. So if something happens during the training, I I can be sure that it's really part of the training. So maybe the architecture of the network or something similar. That's what's wrong with the code and not the part of reading the data. When we have data on Google Cloud bucket and we read straight from that buckets, then we might stumble on some problems. And not just that, we actually have to do more work in order to create the right generators because we need to download just a small amount of data each time and then create a generator that can take that data and apply data augmentation techniques on that and then feed it to the model. So this will be o, this will add more work. And we definitely don't want to do this in this course. In this course, the goal is to go as quickly as possible from having data to creating a model, to train it, to deploy. 28. Verifying that our training pipeline after the new modifications: What we can do now is download the data to a local directory and put everything inside of that local directory. So now let's go back to our trainer and see what we need to modify in order to make this this pipeline complete. So where we have worked on for now as downloading the data from a bucket and put it, putting it in a local folder. But then in the trainer, we actually need to read data from that local folder. So in this specific a place here for the training, the only thing that we need to change for the path to the data is, is this path here. So we can, for example, use data and dummy. So here we will have the same structure as before. So inside we will have evaluation, training and validation folders. So the only thing that will change here as this, and here we can make sure that we are downloading the data. So let's do this. Let's call the same function here. And I want to make sure to import the correct function in my code here. So here I will do from data handler import, download data to local directory. So now I can use this function in my training code. And also I would like to print something's just so that I know that downloading of data has started. And when the data has been downloaded. I would like to know that it has actually finished. So download finished. And now the bath here, I am hard code in it because I'm the one who is controlling the naming of the folders inside my bucket and also on the local machine. So this matter, I can make sure that for example, when I write data dummy, I know that it will be the same structure as the data when it's downloaded from the bucket. Because I know that my local folder will be called data and the data on my bucket will be dummy. And then inside of it there will be the other folders. So here we can also, or we can just leave it because the training we will know that it started in the Council for let me just save this and let's call trainer that pie. And for now, the downloading of the data should not take that long because the data is actually already on the local machine. So this part should be, should be fairly quickly. It will just verify that the data is on the local disk. And after that, the, it will read the new path to the data that we have downloaded and it will run the training here. So of course now it's verifying the different folders and verifying that the different images exist and that they are actually inside our local directory. So as you can see, it has finished the verify, the download or the verification of whether the images are already in our local disk or not. And it has computed the number of images. And now we can see that the training has started and everything is working as it should. So again, just two, emphasize on the fact that having a subset of the data and testing on that subset is always a good idea because you want to accelerate the testing and the modifications of your code. And you don't want to spend a lot of time each time, you make small changes. So as we can see now, we have validated our training pipeline. So we go from downloading the data to creating those generators for data pipeline. And then we are reading, reading that data in passing that data to our deep learning model. And this deep learning model is being trained correctly. And at the end we are using the evaluation generator and we are evaluating on the evaluation data set. And we should compute the classification report and the confusion matrix at the end, as we can see here. So we have validated that our pipeline is working properly. So let's see what we should do next. 29. What is docker and how to use it for our project? (optional): In this section of the course, we will be exploring Docker and we will be using it to dock rise or to containerize our applications. So docker is basically a tool that allows us to do a few things. So the first thing that Docker allows us to do is to group our application and its dependencies inside a virtual setup. The second thing that Docker allows us to do is to isolate this setup from our system. So all of your application and its dependencies will be inside a, a system or inside a virtual setup that is isolated from your system. So whatever is installed as inside this virtual setup will not affect your machine system. It also allows us to run one or multiple copies of our application, which we call Docker containers. So when we group the application that we've built, and it's depends on dependencies inside a virtual setup, we are actually creating or building a Docker image. This is an important term in Docker terminology, which means exactly this definition, a group of application and its dependencies in a virtual setup. And when we run our application, if whether we're on one copy of the application or multiple copies, what we are actually doing is running docker container or multiple Docker containers. So this is what darker, this is what Docker will do for us and what, how we will use it. I will show you this in the upcoming videos, but for now, I will just speak a little bit of the big picture and how do we, how do we use it from a high level overview. So basically darker, it is used in the following manner. What we will do is write a set of lines. Lines of code inside a file, usually is called Docker file. And using this file that we created, we can actually build a Docker image by running a certain Docker command in the command line. And once our Docker image is built, we can run a container or more or multiple containers using this image. So this is the workflow that we will be using. We will be creating I Dockerfile and adding the necessary lines of code that will allow us to group our application and its dependencies, any virtual setup. Then we will build the virtual setup. So we will build a Docker image. And once the Docker images installed or wants the Docker image is built, we then have the ability to run one or multiple copies of this image. But in fact, what we are running, our Docker containers. 30. Small modifications to our files: In this video, we will be creating a Docker image that contains our application and all its dependencies. So just a quick thing before we start writing the lines for our docker file so that we can generate the Docker image. I have added these two lines in our trainer dot pi. So after the import, I am printing this line here and then running the code here. And what this code will do is it's gonna show us the devices that the training is running on. So when TensorFlow is run inside Docker or, or in your virtual environment or wet wherever. You actually can see if your training is happening on the CPU only or GPU. So gpu graphic card. So usually for training, we make use of the GPUs so that we accelerate the training. So that's why I am adding these two lines here. These two lines will allow us to see which devices are available for training. So if we only see the CPU, that means that the training will be running on the CPU and the GPU as well, then that means the training can run on the GPU. So I added these two lines here. And another thing that I have done is I commented lines for installing TensorFlow. Tensor flow enters your board. Since it's part of TensorFlow, I have come to all of these lines. And the reason for this, I will show you when we start writing our docker file. So these are the only two things that I have done. And you should do as well in order to follow with me the rest of the tutorials. And you'll understand why we are doing this. Once we start creating our Docker images. 31. Building a docker image using dockerfiles: And now we'll start creating the code for Docker that will allow us to create docker images. So for this, I will first go here and create a new file. I will call this file Docker file. This naming convention is used in darker so that we easily differentiated from the rest of the files that we have in our project tree. And you can see that VS Code has immediately recognized that this is a Docker file. So now that we have our file created, we can start adding lines that will allow us to create our Docker image or built our Docker image. So all Docker file starts by a line from and then we add here a base image. So for us, the base image will be TensorFlow. Tensorflow. And this is a tag. What comes after two points here as a tag. So what this line does is basically, it's gonna look whether we luckily have an image called TensorFlow. Tensorflow later. If it doesn't exist and our local machine is going to download it from the Docker repository. And this base image will contain the necessary code to do a specific thing. So for our case here, since we are using TensorFlow base image, this will be an image that has Python already installed in it. It will have tensorflow installed in it and all TensorFlow dependencies installed in it. And now you understand why I have commented lines here, because I don't want to install TensorFlow using Pip. I actually want to use the base image that Tensor Flow team has built for us. And then I can run my code inside or using this base image. Usually to get a base image. Or the way to get a base image is by writing from and then giving the name of this image. And then you can add multiple lines that do different things. So for us, what we will add here is a working web directory. And this work dear means working directory, which means that we will, whatever we add to this Docker image, will be inside this directory. So here, usually I use the convention used by many DevOps engineers. Which is to use users source app as a directory for our application. And all the files that we will copy to our Docker image will go inside this working directory. So after this, what we want to do is to copy, copy everything in our project tree inside this working directory. So this is what this command does. When I say Copy dot data, this means that copy everything in our working directory, the Docker image that we want to build. So just a quick point here. So when we are going to copy everything in our working directory to the Docker image. This means that we will be copying this folder. Pie cache will be copying the data, will be copying the virtual environment setup here. But actually all of these things, we don't need them. So that's why what I will do here is add a darker ignore file. As you can see, it has been recognized by VS Code as a file for Docker. So inside this file we will add everything that we want Docker to ignore so that when we run the copy command, it will not take anything that's in here into consideration. So this is very similar to Git. Ignore if you have worked with Git and GitHub for example, or get lab. So the same methodology is applied here. So here we will add, for example, cash. We don't want to copy this folder. We also don't want to copy the data. The reason for this, because I want to actually be able to download the data from scratch from the Google buckets and then put it inside the local machine that's running that Docker image. And also I will be ignoring VM. So now when I run this command, when we are going to build the Docker image, when this command is run, then actually all of these folders will not be included. Which is, which is a good thing. And you might have, in your case, other things that you don't want to copy to your Docker image. So this is the way to do it. So after we copy everything to our working directory, what I will do here is I will run a command for updating. So get hippie update. Because inside the Docker image we actually have a bone to version running. And we just want to be, we want to update the APT get package so that we can download and install any packages without worrying about the version of the APT gets installed in this base image. So we are making sure that we have the latest APT get by running this command here. So after this, after we get, after we update the APT yet, what we can do as start installing the files, the packages that are mentioned in the requirements.txt. So here we actually copied this required requirements.txt file to our Docker image. And then this means that it's inside our working directory. So now what we can do to install the requirements is by running, run. I'm gonna use PIP3. You can use pip. Actually it wouldn't matter because we have Python installed and this base image. So run pip install. And since we will be installing packages from requirements file, I'm going to add minus r here. And then I will just do requirements that because as I have said before, the requirements.txt, when it was copied, it was puts in this working directory. So now that we have this line that will install the requirements.txt packages that are included in this file here, what we can do is use the entry points. Into point is basically the command that allows us to run our Python pile. So here I'm going to run Python three. And then I want to run my trainer dot py file source, trainer dot pi. So this is how we can run our script from the Docker image that we're going to build. So this would be it for the Docker file. As you can see with only a few lines here, we should be able to actually build an image, and this is what we will be doing next. 33. Adding arguments to our training application using Argparse: Another thing that we can use, it's recommended that we added here is the ability to pass arguments to our Python files when we run our trainers, when we're on our container soil. So here what we have done before is running container, but we didn't give it any arguments. But let's suppose, for example, that your trainer that by accepts as arguments the number of batches, for example, or the number of epochs, or the name of the bucket. For example, you want to automate things and you want to keep some sort of flexibility. It's always encouraged to use arguments instead of hard-coding all of these arguments here. And this is exactly what we will be doing now. So first, let me just import the library, which will allow us to add arguments to our file here. And let's start by defining a parser. We do that by using our parse dot argument parser. And now that we have our parser, we can start adding arguments. So I use the method add argument here. And first, let's give it the name of the argument. And for now what I want to pass to my trainer that body as arguments are the names, the name of the bucket, you Google bucket, and also the batch size. So the first argument here will be my bucket name. Here. Usually I like to define the type as well. And let's add a helping string here that helps anyone who is using this script to know exactly what this argument does. So this will be the bucket name on Google Cloud Platform or Google Cloud Storage. Cloud storage. So this is the first argument, and the second argument will be the batch size here. So I'm gonna call it batch size. And the type will be an integer. I'm going to give it a helping expression here. So this is batch size by deep learning model. And we also have the Possibility to define some default values for these, for these arguments here. And I will be doing this here. So for default value, value of bucket name, I'll give it this value here. So I'm gonna copy all of this. And this will be the default value for for the bucket name and for the batch size equals. Let me give it a value of two just for my decode here. So I have these default values. And now what I should do here is add my arguments from the parser as parameters to functions. So for this one, I'm gonna call it persists. Actually, we need to define a args formats here. So our parser dot plus args. And now we have, our arguments are stored in this variable here. And what we can do is just call arms and give it the name of our parameter. So this should be bucket, bucket name. And for the batch size, I'll give it here. Batch size. So now we can run our triangle.py file by giving it some parameters at runtime. And if we don't give those parameters, we just use the default values that we have given here. So first let me just activates. I want to run the trainer dot py without the docker container. So I'm going to test it here. And once I see that it's working properly, then we will build a new Docker image and run a docker container using the arguments here. So drawn our file or our app here. So Python trainer dot by. And we're gonna give it, for example, buckets. Bucket name as the name of this buckets. So let me just copy all of this. Pasted here. And for the batch size, just as it does that, it's using a different value. I'm going to use here for our train function. And you just make sure, OK, actually the best size as the second argument and not the first, not the second, not the third one. So here I'm going to use one EPA condition would be the batch size. We are giving into default value of two. So for me just to, I'm gonna give it a default value. I'm gonna give it a value of one. Let's run up here. We have a brown for a column. Here. Let me just clear this again. Run the command again. And now the trainer that phi should start and it should be using these values here as parameters. And once we make sure that this is working correctly, we can start building a new Docker image. And from the Docker image we can run docker containers that can take arguments at runtime. So this can take some time. And we'll be back when the, when we get at least we move to the park or verifying data. So now we went through the process of verifying whether data is downloaded or not. And as you can see, we are still using only the CPU here. And the training is running correctly. Everything is working as it should. And just to make sure that we are using the parameters that were given by trainer that py file, the defaults batch size is two, but we used one. So here, as you can see, it is moving by one. So it's taken our value here into consideration. So that's good. And now after the training is done, we should see the metrics and the confusion matrix. And it seems that everything is working correctly here. Once this is finished, we will build a new image and we will run a new container with new, with the parameters that we give it a runtime. So now that's the script has finished. We can see that we have the metrics, we have everything working properly. So let's go now and build a new image using this new scripts and then run a container with these parameters here. So in order to do this, let's go back to our terminal window. And let me clear this. And I'm just going to look for that command that we used before. So docker build minus f Docker file. And for this case here, I'm gonna call it test one. So now let's run this command. And now, as you can see, it is building some parts of the image again, because we have made some modifications to our triangle.py file. So it has to redo some of the parts of the image. If we changed nothing, then it's not going to build everything from scratch. But as you can see, these, these two or three commands went very quickly because we already have, we already had, for example, TensorFlow base image downloaded to our local system. The working directory was quick because it's commanded doesn't take a lot of work. The copying as well because we are mostly copying files here. So since we made changes, it's restarted or redone this command for for installing new requirements that are inside our requirements.txt file. So this can take some time, but it shouldn't take a lot of time. And at the end we have a new Docker image that we can run continuous from it. And these containers will take arguments at runtime. So let me just clear the window here. And I will run the same command as before. So docker run dash. And for this one, I'm gonna use test1. This is our new Docker image. And here we can actually best arguments to this Docker container. So when we pass arguments to the docker containers, gonna pass arguments to our trainer, that's Py file. So here let's say I do batch size, gonna give it a batch size of one. Again. When I run this command, it should run my application. So it's going to run trainer dot phi and it's going to pass the value of one for our batch size. So as you can see again, we are only using the CPU here and it's gonna take some time to download the data. Of course. You might be thinking, why didn't we just copy the data to our Docker image so that we don't have to download the data each time. But actually, I want to verify that the downloading part is working properly. So because at the end of the day we're going to move this Docker image to the cloud and we're gonna run containers on the cloud. So we will be downloading data from the Google Cloud Storage. So from our bucket, the local machine that's running the training. So I want to make sure that this part of the code is working properly. So when the download is complete, I'll be back to see what how the training went. So as we can see here, the download has finished and now the training is running and as we can see, is moving by one year. So it is taken our arguments of batch sizes one into consideration. And we can see now that we have a powerful tool because we can add as many arguments as we want and we will be adding some arguments in the future. For example, for the learning rate. If we want to do some changes to it every time we were on the new container. So this could become an argument of the function and then we pass arguments here for it when we run our container. So this is a powerful way that you can use docker with, with the arguments that you define inside your Python files. And it has so many advantages. And we will see some of these very powerful advantages. And maybe the next section and in some future videos. Because when we can pass arguments to the container at runtime, this means we can pass so many arguments. We can run the, run the container multiple times, or even run multiple containers at the same time. But each container will have its own set of arguments. So as you can, there are many possibilities that this way of coding will allow us to do. And this go back and see, as we can see, the training has finished and everything is working correctly. And now that we have this part working properly, let's see what we do next to improve things even more. 34. Necessary steps to use Docker with GPUs: So far we have only been using the CPU for our training within make use of the GPU. And if we leave our code as it is. Right now, when we tried to do the training on Google Cloud Platform machine that we will use on the cloud will only make use of the CPU. But for training deep learning models, especially models that are as big as the one that we have created. It always to use GPUs during training. So in order to use the GPU for training, we actually need to do some modifications to different parts of our our setup. So the first thing that you should do is to had to TensorFlow Docker requirements. So if you type in your window, for example, TensorFlow GPU Docker image, you just click on TensorFlow Docker is gonna get you to, or it's gonna take you to this page here or parts of the page that speaks about TensorFlow Docker requirements. And here it tells you that if you want to use Docker with the GPU so that you can train TensorFlow, deep learning models. These are the requirements. And for me, I had to go through all these steps. And you have to do the same for your machine. And the first thing you have to do is install Docker, which I believe you should have done it by now. Now for the GPU support, you need to install the Nvidia darker support here. Basically for me, I had, I have followed these these guidelines in order to make it work on my machine. And I also wanted to point to sum to one important thing here. Observation here. So it says, take note of your Docker version with darker minus V. Ok, if you do, let's say the occur that my version is 19.03.12. So it says here that virgins earlier than 19.3 require and video Docker to and the runtime and video flag. But on versions after n, including 19.03, you will use the Nvidia container toolkit package and the minus or dash, dash GPUs all flag. So for my case, I am in this second scenario here, for you, you need to make sure that that you are, to find out which scenario you are. And based on that, you need to follow the steps for that. So if we go to the video darker here, it also has all of these guidelines and you need to follow them. I don't want to show one specific way of doing this because as I have I have noticed that they make a lot of changes. For example, just between these two sets of versions, you can see that there are some new changes. So if I give you a guideline now it might change, maybe even next week or so. So that's why I am referring you to the documentation. So the official documentation of TensorFlow with Docker and how to add the GPU support. So if any changes are made in the future, it's probably going to be mentioned in the official documentation. So that's why you should follow what's in this documentation to make things work for you and on your machine. So for here the steps are quite clear. For me, I am on this version here, so 20.04, so I have followed the steps for this version. For you, it might be different, so you should choose the right steps to follow. And for the usage, as they mentioned here. One thing we can do, all the right command that we should do is for example, docker run, dash, dash abuse, all. And then after this, we can run our image showed this will change for us when we add the GPU support. But there's still one thing missing that we need to make sure it's properly stated. 35. Building our docker image with GPU support: So our docker file needs to be changed from this state here to this one. So here, as you can see, there's a lot of, there are a lot of changes. But in fact, all of these changes are, most of them are coming from the Google Cloud documentation. So here one of the projects that they use for training with custom containers as project for class classification of analysts. But here they use the library and different datasets. For me, I have taken parts of this code here, and I have added it to my code in order to make use of the GPU using TensorFlow. So this is also a good opportunity to let you know that this is another example where you can build a deep learning model using by torch. So with this course and with this example in the documentation, you have two examples of training on Google Cloud Platform using different frameworks. For me here, I took many parts of their code and of course, I made some necessary changes where it should. For example, I am using TensorFlow latest GPU before we were only using TensorFlow latest, which only uses the CPU. So here I am using this version as the base image. These parts here, I have taken them from the documentation. Don't worry too much about them. As you can see from the comments, they explain what these parts of the code will do. And here we are configuring some pads also that will be used in Google Cloud when we run the training later. So here, this version of our docker file will allow us to run the training by making use of the GPU. So first we will test this on our local machine. And once we just did on our local machine, we can use this same image to train our model on Google Cloud Platform. So here, let me go back to my terminals here. Let me just clear the terminal. And let's build a new image. I'm gonna call this test GPU. So I'm using the same command as before, and let's run it. And as you can see, is still making use of all layers that were built before. In my other images, my Docker images, I mean, here, it had the step of installing the requirements. Once the requirements are installed. We will have a Docker image that can use the GPU on our local machine. And while this is being run, I wanted to mention something. If you don't have a GPU that support that you can use for training on your local machine. Then don't worry about this because at the end of the day the goal is to run the training on the cloud. So you can build this image locally. You can push it and the cloud. And once you run the training on the cloud, it will use the GPU during the training, except that you won't be able to test that image that you build on your local machine. This will be the only difference. So if you don't have the necessary hardware on your local machine, does this Docker image with the GPU, then don't worry too much about this because as I said, at the end of the day, this is not a goal of this course. The goal is to show you how you can run the training on the cloud by making use of all the resources that exist on Google Cloud Platform. So here, now that the image was built successfully, Let's run a container from this image. So our image was called trainer tests GPU. And again, I am using the dash, dash GPUs all in order to tell it that I want to make use of the GPUs on my machine. I do have an NVIDIA GPU that I can use for training. So I should be able to see the training running online GPU. So let's run this image now and see what we get. Well first, we can see that it is making use of the GPU. It's mentioned here has one of the available devices for training. So that's a good start. And now we are downloading the data again to the local machine. In this case, since we are testing on our local machine then is downloaded it to our local machine. Once we start training on the cloud, then it's gonna download the data to the machine on the cloud that we'll be running our train. So once the download is complete, I will come back and we will see whether the training starts correctly or not. And now that the training or the download has finished, we should now download the weights for the Inception v3 model. And once the download of the weights is finished, we should start the training on the GPU. So with these modifications that we have added to our code, we should be able to run the training on the GPU. Let's see what happens once the download is finished. And we can use this command here, GPU stats, dash minus cB, in order to see the use of our GPU. So as long as it's not high, that means that we're not really using the GPU. If the training is running on the GPU, then this value here will go much, much higher. So now that the download of the weights has finished, let's wait and see if the training starts or not. So as we can see, the training has started and you can notice that it's already faster that we had before. And if we run the GPUs that you can see that it's running at very high percentage. So that means that the training is in fact running on our GPU. And you've seen how much faster in comparison to the training using the CPU alone. And now that the training is finished, we should print the metrics and the confusion matrix here. And as we can see now we have a Docker image that can run the training on the GPU. And again, I want to mention it. If you don't have a GPU that can run the training on your local machine, then don't worry, as long as you have this Docker file here, and as long as you build the image, you don't really need to run a container using that image on your local machine. Because we will be pushing that image into a remote Container Registry. Container Registry basically contains a docker images that you build. This container registry will be on Google Cloud Platform. And from there we can run any container. And if the container support gpu, then the training will use the GPU on Google Cloud Platform. 36. Summary: So in this part of the course were in this section we explored how we can use Docker to containerize our applications. How we can group everything that our application needs in one package. You can call it a Docker image. And then we saw how we can run a docker containers using Docker image that we built. We also explored how we can add GPU support to the darker framework so that we can do the training by making use of the GPU on our machine. I also mentioned that if you don't have an NVIDIA GPU that can support training deep learning models, then you shouldn't worry because the end goal is to run Docker containers on the cloud. So as long as you have a Docker image, as long as you have built-in successfully, then you should be fine. The only thing that we will do later as images or push this specific GPU image. Two, a Google Container Registry, which basically saved our docker images. And then from there we can run containers on Google Cloud Platform that they themselves will run the training for us. So this is what we did in this section. And for the next section, we will start from here. And we will define some basic things. And then we will push our image and run a training job on AI platform, which is a service provided by Google Cloud Platform. 37. What is cloud computing and what is AI Platform? (optional): Hello and welcome to this new section of the course, where we will be exploring how to leverage the power of cloud computing, specifically using Google Cloud Platform in order to train our deep learning model. And we will be using the service called a platform for this specific task. So if you go to Wikipedia definition that you will find there is that cloud computing is the on-demand availability of computer system resources, especially data storage and computing power without direct active management by the user. So what they mean by this is that you have access to these machines. And these machines have available resources that you can use for storage and also for computing power. For me, I always, when I think of cloud computing, I think of it as remote Linux machines that are, that have almost the same thing as you have on your laptop, except that the configuration might be much, much better if you don't have good GPUs on your machine, for example, then on these remote machines, on the cloud, you will have better resources. You will have access to more storage if you need to, so on and so forth. And one important aspect also is that these remote Linux machines actually contains some software that does specific things that you can just use it directly. You don't have to install it on your local machine. You don't have to worry about whether your machine can handle demands and the resources that the software will need. So cloud computing is made for this specific task, which is to allow you to get access to these powerful machines with a high availability of storage and computing power. And for this specific section, what we will be using Google AI platform. So a platform is one of the services available on Google Cloud Platform. So this a platform for artificial intelligence tasks. So almost anything related to AI, you, you will know that you would need in your projects, you will find this AI platform service. Moreover, it has different approaches to training machine learning models. So there is the direct approach where, for example, they have support for TensorFlow. And you can just use your code, you upload your code. And a platform service would read your code. And then it will do the training as you would do on your local machine. And also there is the way that we are doing in this course, which is through containers, where we build a container that has inside all the code and the dependencies of that code that you need in order to do your training. And then we can just use this container on a platform in order to run our training. Another great thing about the services that it supports, most, if not all, machine learning frameworks. So if you think of TensorFlow by George amex, scikit-learn, all of these libraries you can use in your AI platform service. And the best way to do this, in my opinion, is to use containers. Because when you use containers, you are isolating your code from from the system and you're also compacting the dependencies that your code needs in just one image, a Docker image. So that makes it very easy to use any new libraries and frameworks. And even if you'd like to experiment with different versions of the same frameworks. For example, as you know, as you have heard, maybe the TensorFlow 1 x is different than TensorFlow, TensorFlow to point x. So you might want to maybe experiments on these versions, the versions that, that come with 1 x and diversions that come with 2 x. And maybe you want to run the same training and see the differences. Maybe you want to test new, very let's say a developer version of some framework. You can do all of this using containers. And since a platform suppose supports containers, you can use those frameworks and libraries to do whatever you like using. Using Google Cloud Platform. 38. What other APIs do we need?: So in order to use AI platform for training our deep learning model, we first need to set up a few things on our page here and on our project. So usually the way I like to do this is by following the guide provided by Google Cloud Platform. So here, as you can see, I am getting started with custom containers. I will put this link here, linked to this lecture so that you can see it and you can access it. But if you can access it from this, then you should just google, for example, let's say using custom containers with Google platform. So here we go here training with custom containers. You get access to the same page as before. And I personally like to Google things and not just use direct links because from the history of Google, they make a lot of changes. So maybe in the future if you're watching this, if you try to this link here, then you won't be able to see this page. So this is why I always just Google things so that if they changed the pages links, then I won't have a problem with that. So let me just close these parts here and let's go back here at this page. And this is the guy. So, and the guide here, you see that we have a few steps that we need to do in order to set up everything we need for training. So the first thing is to go to our selector base. We are already in our page, our project here, so we are in the correct project page. The second thing that we need to do is to make sure that the billing is enabled. So in order to make sure that billing is enabled, what you can do is you can go to your project and click this burger menu here. And for me I have put Billing here so that I can access it quickly. I pinned it as you can see. If you don't have it straight to you, then you should have it here. And if you don't want to look it up in here, you can just search it here. So according to the documentation, to make sure that the billing is enabled, you are in either one of these two cases. So if billing is not enabled, then when you try to access this page here, you will have a pop-up window that displays and it will ask you to enable billing on that project and link a billing account. If your billing is enabled, then you will just have the page has mean here. So as you can see, I have everything. I have a billing account here. I have my My Cloud credits here. And as you can see, I'm still not I haven't yet used the $300 that you get when you create a new Google Cloud Platform account. And of course for me in euros, so it's a 160 I started with, I think 270 something. So for you, if you're using dollars, then you should have $300 or maybe a 120. If you have used some resources as before, like Google buckets and other things like that. So let's see the next things that we need to make sure we have. The third thing that we need to do is to enable some APIs that we, that we will need. So here, as you can see, enable a block on training and prediction Compute Engine, Container Registry API. So we need all of these APIs in our project. So usually I just, when I, when I come to this guide, you just click on the button here and it will take you to the page where it will enable all of these APIs. So for here is going to give you the option to choose which project we will enable APIs for. So for me I will choose this one because this is our projects. Our click Continue. It might take few seconds, maybe a little more, but at the end we will have all the necessary APIs enabled. And then we can use those API's or social. For example, for the container registry, this is basically where we will push our Docker image that we build locally. So when we built it, when we when we push it to this container registry, it will be available to us on Google Cloud Platform. That means that if we go to AA for service and we tried to access our Docker image, you can just access it on or inside the container registry. So all of these APIs here we will need for the Compute Engine. Also. This what, what we will need behind, what we will use behind the scenes to do all sorts of computation. So running the training and the other computations. So I'm just going to wait for the enabling to finish. And once it's finished, we'll see what we have. Now. Moved. Club platform has finished enabling the APIs. So for me actually it took a few minutes. So if it's taken little more time than you expect, then don't worry, just needed for some minutes. You should be fine. So now the APIs are enabled. Click continue here. And this should get us back to our project page. As you can see, we have everything here. So now let's go back to our steps here that we started following. And so far we have reached step three where we enable the APIs. Now, the other thing that we need to do is to install and initialize the Cloud SDK. So Cloud, Google Cloud SDK as basically a software developer development kit from Google Cloud. And once you download it and install it on your local machine, you can interact with Google Cloud Services using your terminal window. So we will be using this in order to, for example, push our Docker image Container Registry and also running the training on the cloud. We will be writing some commands on our terminal windows, like these, like this one. And then we will be able to interact with Google Cloud services in order to do all sorts of things. And finally, what you need to do is to install Docker. I believe that you have it already installed. If you have followed the video that the video instructions that I have shown you before. If you haven't done that yet, then you can do it in this step here. So one last thing here is you don't want to use sudo each time you want to call it a Docker command, then you can use this command here. And once you do this, you will be able to use Docker every time without using pseudo. Before that, before the name Docker. And now that we have followed the steps, we should have everything ready. And we will go to our terminal window and run some commands there. And of course, we should have everything setup. And if you just follow this guide here, fine. It's very clear you will have all the necessary instructions to enable or to download and install this software developer kit from Google. 39. Pushing our image to Google Container Registry: And I forgot to mention a step here that we need in order to be able to push our image through the Google cloud registry, which is this seven step here. So if we need to make sure that G-Cloud or Google Cloud and darker are set up correctly. And other for us to do that, we need to run this command here. So if you haven't done it, you can do it now. It's not too late to do it. And now that we have these steps finished, we can start building our Docker image, death we will use that will push our Google cloud registry and then we can run training using that Docker image. So you might be asking why not just use the same Docker image that we built before. So we, as you remember, we have built images here, we have ones with GPU support. So why not just use these ones? And the answer is that in order for us to push our image to Google cloud registry, it needs to have a certain naming convention and order for Google Cloud to know where to actually put this image. So that's why we will just create a new naming and we will use it for pushing our First, we will build the image and then we will push it to Google cloud registry. So let me just clear this one here. And the steps that I will take now and creating this specific name, you can just follow them here. So on build and test your Docker image locally. So for us, we will not test it locally. We have already built the exact same image, and we have tested it locally so we know it works. The only thing that will change now as the name that we will give it to our image. So let's just start by taking these commands here and just copy pasting them. Because we will basically do the same thing here. And this is the first one. Let's get the image repo name. So first we got the project ID. Now we're gonna get, we're gonna set a image repo name. So for us, we just gonna use let's say TF food food classification, reclassification. Yeah, just going like this. And Go to the next step. Here, does give it a tag. So here I'm going to call it Classification and colon GPU here. Since we will be using the docker file that builds an image that has GPU enabled. So now we have this one. And finally, let's create an image URI. And the base, this should never think. For the image URI. I will just copy the whole thing and paste this. And here with this link you can call it here GC or that's IO. We are pushing all we are building an image that later will be pushed to Google cloud registry that exists in the US. So here there are different options. If you go to this link here, Container Registry slash docx slash overview. You can see that the images does start with GPCR dot IO. They are currently hosted in the United States, but this might change in the future. But if you want to make sure that you, that your image will be in the US, you can just use this naming here. If you want to push it to a region that in Europe, you can use you WCR dot, Ohio, and in Asia, you can use Asia that you see are dots. So depending on what you want to do. So for me, I am currently in Europe, but I will be using this Cr, that's I0 here because this, this project here basically for testing purposes. And this is not something that I want to run it in production in Europe. So that's why I'm just gonna settle with this naming here. But again, you can use any of these names here and this name here. Although currently the images are stored in the United States, this might change in the future. So let me just go back to the steps here and let me just create this image URI, I want just to check it and see what I get. So as you can see our images called GC or that's io slash train deep learning models. So this is the project ID, then here the repo name. So I call the TF food classification and then the tag name food classification, GPU. And finally, in order to build our Docker image, we'll just use this line here. There's gonna build a Docker image from our docker file, and it will use this image URI as a name for that Docker image. So let's run this command. And as before, it might take some time. But for me, since I have already built the exact same image before just with a different name. As you can see, Docker has used the cached version of my image and just gave it the new name here. This and this do not occur. Image list. I should have it somewhere here. Yeah, we have it here. So io trained deep learning models for classification. We have this segue here. So everything is set up correctly. Let me clear this. And if you want, you can run again the image locally and test it. But for me, since I know that it's working properly, because I have done tests before. I'm just going to push my image to dads Google cloud registry. And again here you can see if you haven't yet ran this command here, you can run it now because we will need it in specific. And we are here because we are using Docker, but we're pushing an image, a Google cloud registry. So there needs to be a certain configuration between Google Cloud Platform and the Docker. So again, I'm just going to copy this and paste it. And then we run this. So for this, it might take some time because it's gonna prepare different parts of the image and then push them one by 12, or maybe some of them similar simultaneously to the Google cloud registry. So this might take some time. And once this is finished, and we'll show you what we have. And now that's pushing process has finished and see that all the different parts have been pushed to the parsed that or they exist on Google containers were not pushed. And we have the process finished. And no problem is here. And now let's just go back to council here and this lookup for lookup, Container Registry. Okay, we have it here. And if we click here, you can see the list of all the images that you have on that registry. So for us, we have this image here that we just pushed. Score TF for classification. It's private, so only you can access it unless you give access to other people. And if you click it, you have some details about the Docker image that we have just pushed. So now we have everything set up for training on air platform. And let's do that in the next video. 40. Setting up things for our training job: And now let's create a training job on a platform. But before we do that, let's first make sure that we are logged in using the right email. For example, if you have multiple Gmail accounts using a certain one for this course, you need to make sure that that's actually the case because sometimes you won't be logged in with the correct email. So let me just write this cloud of login. And this will open up a page of your emails and you need to make sure that you're logged in with the right email. For me, I'm going to click allow. Rule is the to do all of these things. And now I'm logged in here. So another thing that we need to do, or you need to make sure of, is that you are actually using the right projects. So if you have multiple projects on your MOOC cloud account, then maybe G-Cloud is not choosing the right project. So that you do that, you need to run this command here. So G Cloud config set project project ID. For me. As you can see, your current project is trained deep learning models, which is the right projects. If this wasn't the correct project, then I would have set it using this command here. So I'll just write G Cloud config set projects. And the project ID are given the same name as the project. And you should be fine. And now that we have made sure that we are using the right email, the right project. Let's start by defining some variables that we'll need for our training job. So let me just clear the screen here. And for me, what I would need to set here. For example, the region. The region will determine what what region you want to send this training job too. So it could be in the US, in Europe depending on where you are and also depending on where your project is. So for here, for the region, I can, I can set it to US. Central one. This is a certain region that you can use. But apart from that, you can actually Google Cloud Platform regions. Or as the Google Cloud Platform regions. Here. We should have a list of the regions here. So as you can see in America, you can choose any of these regions in here. So there's us with us was to US Central one. And of course this is the region, just this one number. For Europe. Also we have different settings. We can use the Parisian so your list1. And of course each one is linked to a certain place in Europe. So for the sake of this example, for simplicity, we'll just be using the US central regions. Us Central one is go back and make sure you write one. Yes. So this Iowa, set it to this and apply this, our job a name. So I'm going to explore the job name and are given, for example, food training or food classification, classification, container container job. And we have the ability to set a date to this. So this approach here where we defined here. And then we can give a date formats to our dates. So in order to do that, let's write it as follows. So I'm gonna start with the year and then, then the month and year, month, day, hours, minutes and seconds. Seconds. So now we have no exposure. Japanese professor continuum job. He had date here. Okay. Yeah, we need to actually we don't need to add any spaces there. So let's just try this. So echo name. And as you can see, we have pre-qualification container job. And then the date. This will allow us to easily distinguish between the different jobs for the same training process. So I can run the same training process on the same data set. I mean, but multiple times. And based on the days I can know which one was ran at which time. 41. Launching a training job on AI Platform and checking the logs: In order to run our training job on a platform, what I'll be doing is taking this command here from the same link that we used before, where we had the different steps for setting up things. So here as you remember, we use these steps to set up everything for our project here and for the training. So the end of the page, we will use this command here because it will allow us to use the GPU, to allow us to use our image Docker image that we have built before. So I'm just gonna copy this text file. I will remove the things that we don't need. And I will add the batch size here as the parameter is. You remember the best size, as I said before, is part of our scripts. So if we give it a number here, it will be passed to our trainer here as a parameter for the training. So let's go back here and the batch size, I'm going to choose something like eight. You can choose other values. Of course, you can test with different values. And here we have the same thing. So for scale, here, this is a certain setup for the GPU. There are other setups, but for now we will just use this. If you want to know about the other setups, you can check a platform scale here, attributes for the region will be using the region that we have setup, which is US Central one. And of course for the image URI. As you remember, we have built our image with a certain name and that name is stored in this variable here called image URI. And even verify this. So if I write eco image URI, we will get the name of the image. And now let's copy this and this space here. And let's run this. Okay, we actually, since I have done a test before when this job name, I will need to change the name here. So I'll just run this command again, explore job name for a sufficient job date. And when I run it again, I will get a new job name with the exact date that I have now. So now I have a new job name. Let me clear this and this again, and let's run it. And now as you can see, it says here that submitted successfully. So this job has been submitted successfully. You job is still active. We can run these commands here for looking at the logs. But I personally prefer to look at the logs on the platform. So if I go back to my console and I go to platform, as you can see, these are my recent training and prediction jobs. These are some tests that I have done before. And if I go to the last job here, it says it's still preparing. And if I click on View logs here, we can get the logs on the machine that is running the training. And I have tested this before and it might take some time just to set up the machine. Because so many people are using different machines on Google Cloud Platform. So there is a certain way that people at Google manage these resources. So usually it takes some time to start, but once it starts, because they have really good machines. So for now, I will I will leave this here and I will come back when the training has finished. And let's see what kind of logs we get. Then. The training job has finished. We can see all the logs in here. So we have the part where the data has been downloaded to that local machine. We can see even timestamps or each process, how much she took. So here we have everything. As you can see, we have the metrics at the end, we have the confusion matrix. So now, as you can see, we have the full pipeline where we have the data downloaded from the bucket puts on a machine that's on Google Cloud. And then the training is ran on that machine using a GPU. So we have the full pipeline working now. And let's see what we can do next. 42. What is hyperparameters tuning?: What I would like to show you now is hyperparameters turning. So in machine learning algorithms, some parameters are learned and others are tuned. So for example, in our neural network where we have as parameters that are tuned, the learning rate, the number of epochs, the batch size, so on and so forth. But there, the parameters that are learned are basically the weights and biases of the neural network. And as a Machine Learning engineer, what you are usually faced with or the two axes that you work on when you are trying to develop a certain algorithm for a certain tasks are these two axis. So the first one is the machine learning algorithms, and the second one is the hyperparameter tuning. So for the machine learning algorithms, let's suppose that you have a certain task in mind that you want to solve. You decide to choose a certain algorithm to solve that task. For example, classification. As we are doing it in this course here. You choose a neural network, a certain architecture, and that would be our algorithm. Once you choose that architecture, you are now faced with another thing. When you train your neural network. You actually, each time you change hyper-parameters, such as learning rate and number of epochs. When you change these parameters, the neural network behaves a little differently. So the end result and the model that you train can be different. So it can be much, much better and also much, much worse. So for this reason, as a machine learning or deep learning engineer, you're always faced with these two things. They are the main thing that you do in your job. You choose the machine learning algorithm. And in this case, when I say algorithm, I don't, I don't just mean either deep learning or other types of machine learning algorithms like random forests for SVM. Now, even I'm speaking about even neural network architecture is also considered as an algorithm. So when you change the neural network architecture, you are also changing the algorithm for your task. So that's the axis that I am showing here. And the second axis is the hyperparameters tuning. And on Google Cloud Platform, we have actually ability to run multiple trainings at the same time. So imagine that for our task, we would like to run the training with a batch size of four. And we want to see whether if we change it to eight, we get better results. So what you would usually do is run the training with batch size four, then change it to eight. And if you want to test other values, you can do that as well. And the same thing goes for other hyperparameters. For example, for learning rates, you can start with ten to the power of minus five. And then maybe you move a little bit further, maybe ten to the power of minus four, so on and so forth. So when you change these values, your neural network would behave differently and you might get a, a loss that is either higher or lower. So as you can see, there is a space or the search where you can change many things and you get different results. So on Google Cloud Platform we have this tool that allows us to run multiple trainings. And each time we change the values here of the hyper-parameters automatically. And not just that. In fact, on Google Cloud AI platform, we can run multiple training at the same time in parallel. So we can get results quicker than what you would have if you just run the trainings one after the other. So that's why I wanted to show you this tool in Google Cloud Platform for hyperparameter tuning. So we will add some lines of code in our trainer, that's Py file. And also we will add a configuration file which contains the parameters that we want to hide and to two. And then we will run a training job, just like we have done so far. But now we will have many trainings happening at the same time with different hyperparameters for each training. 43. Configuring hyperparameters tuning: So in order to configure our scripts to use hyperparameter tuning, I have made some changes to the script and I'll show you here what I have done so far. So first of all, I have imported this package called hyper tune. So this package will be used by Google Cloud Platform due to hypertension. And if you remember, in our docker file, we have installed a few things that I just took them from the example on the dock in the documentation. And as you can see, we actually have Cloud ML hyper tune. This is the same thing. This package here, hyper tune. So we are installing it inside our Docker image so that we can use it on Google Cloud Platform. So as I said, I have imported this package here. And the other thing that I have done, okay, let me just remove this. We don't need this anymore. And what I have done is that if you remember the learning rate, we had it here as a hard coded value. I now have it as a parameter to the function train. And then what I have in the main, I actually added it as a parameter that we pass to our script. And then once we get it, we can pass it to the train function. And of course, the train function will take that value in consideration. So for this hyper tuning process, we always have to add the parameters that we want to hyper parameters to the script. So if you want to, for example, let's say you want to add another hyperparameter. For example, let's say for this layer here, you want to try different values, not just 512, maybe different values, and also for the dropout. So these two are hyperparameters as well. And I encourage you to actually try this on your own. So if you want to add one of these two or these two values as a parameters that we'll tune, that you will tune during the training. Then they have to be passed as parameters to the. So you have to add another line of this. So parser that argument. And you might call it dropout parameters, something like this. And only then you can actually hide opportunities on using the platform tool. So, and of course, just, just because you have a parameter that you pass here, for example, the bucket name, that doesn't mean that you have to to tune this parameter. So this bucket name as a parameter that you pass to your script. But it doesn't affect the hyperparameter tuning process. And the way we define the parameters to tune is by adding a config file. This file is has an extension, YAML. So I just called it config. You can call it whatever you like. And I have this code here that I actually use from the documentation where I took their example here, and I changed it to the settings that I have in my code. So for this config file, what we need as you first start by defining this as training inputs, hyperparameters, you add the goal here. So the goal is minimize. The reason why we are saying minimize, because we are choosing the metric of the loss function. So we want the loss function to go down. That's why we're choosing minimized. If, for example, you choose accuracy, then you would need to change goal to maximize because accuracy you want to maximize and other things. Other important parameters in this conflict file is the max trials. So D max trials as the different experiments that you want to run with different values of hyperparameters. So here, let's change it for example, to, let's say seven MAX parallel max trials. Or let's say we have eight here and this second value here or this second parameter, MAX parallel trials. It means how many trials you want to run at the same time. So at most it can be the same as the max trials. But for me, I will put it just for here. So this means that it's almost like I'm running the process twice. And each time I am actually running for different training processes at the same time. Here, this parameter enabled trial early stopping, setting it to false. The parameters or the params attribute here is the one where we will define the parameters that we want to tune. And you see here that I am using the same name. So batch size. And also for the second parameter, learning rate, I am using the same name that I have here when I pass it to my script. So this is very important. So if you call this, for example, LR, learning rate for short, then you have to set two and r as well. And when you define parameters, you start by the name of the parameter. And again, you get it from your scripts and how you are passing into the script. You define the type of the parameter. So for the batch size as an integer, you define the minimum value and the maximum value that you want your batch size to take. So I'm starting with for Azure as a possible batch size, and I'm going all the way to 32. And we're moving in a linear scale. So it can take like four, maybe six, than ten as a linear development. There are other ways to define the scale type. It's a scale type and you can look it up on the documentation. So same thing we are doing it for the learning rate here. So the name is set to learning rates. The type is double because the learning rate is double. And I'm giving it the minimum value of ten to the power of minus five, and the maximum value of ten to the power of minus four. And of course, I'm still moving between these values using unit linear scale. So once we define these parameters here, and once we have our script setup for the hyperparameter tuning, we should be set for for the platform. But one last thing that I forgot to mention is that since we need a hyperparameter metric here, this loss, we need to get it from somewhere. So this should not be the same laws as the training loss, because during the training we get many values of the loss. So what we need to do is to evaluate our model at the end of the training. So as you see here, after I print my confusion matrix, I print a statement that tells me that the evaluation will start. And then I do evaluation using the eval generator. So this is different than the predicts generator here. For the predicts, we are given it some parameters or some examples, or some data here, and it's actually giving us the predictions. So we are having the probabilities associated with those examples in the eval generator. But when we run the evaluate generator function, what we are doing is actually taking images and also the labels corresponding to those images. And we are computing a score here based on those examples. And discourse here actually contains 22 values. So the scores 0 here is containing the laws that we get when we run the evaluation generator. And discourse one is actually the accuracy. So when you run the evaluate, the evaluation or evaluate generator, or just evaluate what you are actually getting is the last corresponding to that evaluation and also the accuracy corresponding to that evaluation. And this loss here is what I am adding to my hyperparameter tuning. So here I have defined the hyper tune object here from the class hypertrophy. And then I am adding this line here for reporting hyperparameter tuning metric. I'm given it as a tag. The loss, the metric value is the last. So this is what I am passing it here. So what I get from the scores, so from devaluation, I put it here. And then the global step, I'm given a box. So this is the setup that we need for hyperparameter tuning. And let's build our Docker image and let's push it then. Run a training job and see what we get. 44. Building a new docker image with the new setup: And now let's build our new Docker image. We will use the same command as before. So docker build dash, dash, f, Dockerfile, dash T, image URI. And we will use the same image URI as before. We just want to keep updating our image that we have pushed to Google Container Registry. So I'll run this command here. Let's see. So most of the layers are already existing. That's why it's not taken a long time to create the new image and it's only copying and installing requirements again. So let's wait and see. This should not take so long. And now that our Docker image has been successfully built, let's push it to the Google Container Registry. So again, docker push. And here we're gonna give it the image URI. And again, some of the layers are ID exist. So it's not going to push them again. So only gonna push the new or the modified layers. So this shouldn't take long as well. 45. Launching a training job with the new setup: So now our Docker image has been pushed to Google Container Registry. So what I would like to do now is the same thing as we did before. So we're just going to define a new job name with the current date. So let's define this. Let me clear this. And now we need to call our commands. So vCloud Air platform jobs, submit training, we give it the job name, then scale. Here. We give it the basic GPU region, the same region as before. And we use our image URI here. And the new perimeter that we added here as this config parameters. So for decoupling what we are actually giving it as bad to our config dot yaml file. So for me I just had the did as editors inside my code. So remember right here, just with the other documents in my code. And by doing this, we will submit a training job, but it has hyperparameter tuning possibility. So let's go back to our dashboard here. And let me go back again to AI platform. So go here. And as you can see your eye, as it says here in your EHR platform area here, I have a job that is running right now. So this is the job that I just started. So let me click this and let's see what we get. So the first thing that you should notice is that we now have this hyper, hypertension trials here, where it will show us all the trials that this training job has done. So as you remember, we have defined here max trials of eight, and we will run for trials in parallel each time. So this means we will go twice and each time we run for processes at the same time. So usually what I like to do is to view the log. So if you come here at the logs and you click, we should see the logs. And this page here can help us identify if there are problems and if everything went smoothly or not. So as you can see now, we have trial ID for 3.1.2. We have for trial IDs being provisioned and the waiting to be to start. And once they start, we should see the trials running in here. And at the end. Based on the batch size values and the learning rate values, we will get a certain loss value here. And after that, we can know that. We can know which. Hyperparameters give us the best loss function. So when I say the best loss function, I mean the lowest value of the loss function. And in the trie logs here, we can see everything. And again, it takes some time to start, but once it starts, it goes fairly quickly because they have really good machines. And after some time, here you can see that we have four trials happening at the same time. And if we go back to our logs and open up new logs here, give us still maybe haven't launched yet, maybe just launched the training trials, but it hasn't yet started the whole training. But as you can see, the trials I running. And here as you can see, we have the hyperparameter tuning function has taken this learning rate and this batch size, and also just learning or learning rates and this batch size, different values and running the training using those values. And you can immediately understand that if you increase the number of trials and you change the min and max values of your parameters, you get more parameters to try. And of course, you get different results that you can choose from. For our case. I'm only running a few trials here just to show you this, this powerful tool that you can use. Because as I mentioned before, usually the work of a Machine Learning engineer is do these trials by himself. But with this tool here, we can just run multiple trainings at the same time with different parameters. And we choose the min and the max values of those parameters. And we can choose as many parameters as we like in here. So you can imagine how powerful this tool can be. So now, let me just go back to the logs are still taken some time here, but when the trials are done, I will go back here and I'll show you what we have. I left the training going for some time and as you can see now, we have seven trials that have already finished. So the only trial that's still going is the trial number two. And the rest of the trials, we can see that the trial ID, we see the loss, and the loss is sorted here and an incremental order. We can see the training step, step at which step is stopped. So for now, we don't have any parameter. For example, do early stopping. So each time it stopping at the last epoch. And as you remember in the trainer, we have ten epochs. So each time is taken, the last it stopping at the last epoch. We see how much time it took for that training process. We see also the parameters corresponding to that configuration. And now the only one that is left is trial ID number two. For now, I'm not going to wait for it. There's just go to the logs here. And as you can see, for each trial ID, we have a set of metrics, the Confusion, confusion matrix. We have everything for that specific training process. And as you can see now, we were able to run multiple trainings with different parameters and we got different results here. So now what is missing is the part where we save our model. I intentionally left this until the end because now that we have everything set up and we have our training and working in this manner where we do hyperparameter tuning and everything else. Now, I think is the time for saving the model. And I will show you how to do this. 46. Saving our trained model (but there is a problem): For saving our model, I first would like to have some callbacks here. So callbacks are functions that curious can use during the training to verify things and to do other things. For example, to save the model at some specific steps. We will also use a callback for stopping early. So here let's start by importing the necessary packages. So curious that callbacks when I import early stopping and also model checkpoints. So this early stopping here is a callback that allows us to stop the training if certain criteria is verified. So in our case, we want to stop the training when, for example, the validation loss did not go down in five epochs. So five consecutive epochs, the validation loss is the same order, validation accuracy is the same. That depends on how we want to define things. So for now, let's go down to here. And for the callbacks, we need to add them here. So callbacks. And we can have a list here. I'm going to leave it empty for now. And here I will define my callbacks. So the first one will be the early stopping. And I will use early stopping class that I have imported here. And I will choose what to monitor. So I would like to monitor the validation loss. So if the validation loss is the same, after five epochs, I'm gonna choose here. You can choose whatever you like. But for me, I think five is a good enough number. So after five epochs, the validation loss is the same, then we will stop. So let me add this here. Early stopping. And the second thing, or the second callback that I would like to add, the checkpoint savour. So I'm going to call it for short CAPT saver. And I will use model checkpoints class that I have imported. Inside. I will define first of all, the bad. So I'm gonna leave this empty for now. I'll come back to it later. And what to monitor? How will monitor the validation accuracy? You can monitor the validation loss as well. Just here, I would like to monitor the validation accuracy for saving the checkpoint. And for the moment, I would like to choose max. So here, when we choose the validation accuracy, what I would like to keep the model with the highest validation accuracy if I had a loss, for example here, the validation loss than I would have used men, because I would like to keep the model with the minimum validation loss possible. Let's see what the other parameters we can add. Another parameter is called Save best only. So if we set this to true, then it will only save the best parameter. Now sorted the best model. So it's going to save a model. If it finds another model that is better than that previous model, then is going to replace it. So at the end of the training, we will have the best possible and the best model possible is based on the validation accuracy. So the model with the highest validation accuracy will be kept and will be saved. And also, let's define the same frequency. And here we can keep this. We can give it devalue epoch. So here for the, for this value, let me see if I can go to the documentation just to verify this. For the mode, save weight, say frequency, epoch or integer, when using e pi D callback saves the model after each epoch. So although it will save the model after each epoch at the end, it will only keep the best model based on more VSAT here, save best only. So let me close this. And this is also a good thing that you can do from time to time. If you don't remember what the attributes do or what classes do, you can just go to their code if possible, if you can access it. And you can read more about what each attribute does. And if that does not help, you can go to the documentation. And finally, you're gonna set verbose to one. So just see some, some things while saving. And let me add the code back here. So now I have to callbacks here. What's left is the path of where to save the model. So for now, what I would like to do is every time we run a new container, Who would like to create a new path? So bad to save model holds temporary folder inside the machine that's running the training. And if the path does not exist, so if the path is dear. So path model, this is not the case, it doesn't exist. But I would like to do is make the path model. So here I'm making sure that I am creating this folder here. And this folder should be, we created the same, the same tree of our project here. So it should be created here and it will be called temp. And then we will save the model inside this folder here. But of course, when the container is is, has finished working. When the training has finished working. If their model is inside, inside that container, then at the end, everything will be deleted and we won't be able to get access to our trained model in order for us to get access to it. Where we need to do is to actually move this folder from that local machine that's running that container and putting it somewhere that that is consistent so that we can have access to it. And such such a place could be a bucket where we use it only to save the mothers that were trained. And here, sorry, I forgot to add this OPAT saved model. Let's see what we can do next. 47. Adding function to upload trained models to a google bucket: As I have mentioned before, every time we run a container is doing the training and it's saving the model in that local machine that's running the training. So when training stops, wants to container has, has finished all the work, then it's gotta kill the process and then delete all the resources that were created. So including the model that was saved. So in order to overcome this problem, we need to do is to move those files from the container to a place that is consistent, such as a Google buckets before we kill the container and we delete the resources. So in order for us to do that, let's first start by creating a function that will upload data to the bucket. So before we created a function that downloads data four buckets, and now we will create a function that upload data to the bucket. So for this function is defined, it's here and let's call it upload data two buckets. And this function will take three arguments, is gonna take the bucket name. So this will be the buckets where we will be uploading the data. And the second argument will be the path to the data. So that local machine, we have a path to that data that we want to move from that local machine to the cloud on that Google buckets. So this path to data will be the data on that local machine. And the third argument, we will call it buckets blob name. And this will be the name of the object that's created inside that bucket. So when we upload whatever is inside this fat here, when we upload it to the bucket, we can give it a name so that it's a saved there with this specific name. So first, let's create a storage client like we did before. So storage clients, and we will do it in the same manner as before. So storage dot clients from service accounts, our JSON, and then we'll give it back to credentials. Let's make sure this is the same as before. So fat credentials, we just copy the same. So this is the wrong function here. So it's the same name here. And then what we will do is we will get the bucket. In order to do that, we will use our storage clients and we're going to call the method get bucket, and we will give it the bucket name. So now we have an object that stores the bucket or a reference to the bucket. And then we will create our blob using that buckets. So buckets dot blob, and we'll give it the bucket blob name. So here we are telling the bucket to create a new object, a new blob with this bucket blob name has its name. And then we will use the method upload from filename and we will give it the bad data. So with this function will be able to upload data from that local machine that's running the training. So we can grab the files on the local machine and upload it to whatever buckets that we want. 48. Zipping and uploading trained models to google storage: And now that we have our function creates, it does go back to our trainer dots. And that's important so that we don't forget, so upload data to bucket and as the name of our function. And apart from this, what we will do is make changes to our code. So that's our model, can be uploaded at the end of the training. So when we go to our train function, we go here. So here we have the path through the model and it's saved to a temporary folder inside the container, inside the machine that's running container. And of course we're using that to save the model here. And we're only keeping the best model. So for each experiment, we are only keeping one model is the best. And the best means that it has the highest validation accuracy. So here we run the training so we fit the data and at the end, we are creating or calculating the confusion matrix. And everything that we need is in here. So what we will do is before, before we create or before we report those hyperparameter tuning metrics, we will create a zip folder. Zip the folder that contains the model here. So this folder here, we will zip everything that's inside of it. And then we will move that zip folder to our buckets. And in order for us to do this, we will need to import a package here, cold. So import this package will allow us to zip that folder and go back here. And first, let's define a name for our zipped folder. So zipped folder name. And we want to make this name unique so that if we run multiple experiments and in each experiment we grow saving one model. We don't want the models to be confused with each other because they have the same name. So to do that, we will use two different values. Debt's gonna make our model name or our folder name unique. And for that we will use a timestamp. And also we will use the loss value that we get here. So let's first start by defining just the timestamp or just call it trained model. And then here, what I would ideally like to have is a variable here called now, which represents the time stamp. And this should be. A string so that we can get the variable here. And in order for me to define the timestamp, I will need a package called datetime. So actually I will import from DateTime, import datetime. So now I can use this package to create a timestamp. And in order to create a timestamp, we need time that now. So if we use string F time method here, we can actually control how they would look like. So let me go back here. So I'm just going to create a a date that contains the year, the month, day, the hour, the minutes, and the seconds. Let's call them here. So this should be the year, then the month, then this should be actually here. And the month, the day. And I think this format is will make it easier for us to read the data and understand which states it is. So here should be the hours, and then we have the minutes. And finally, we have the seconds. So with this format here, we will add it to our name for the zip folder name. And what we will do is add the part for the lost function. So for the last function, we will do the same thing. So I'm going to just try it loss here and add the loss in here. So in this way, I am creating a unique name for my zipped folder. So it will be trained model than the timestamp that I create here, then the loss that we got in the evaluation phase here. And using this, we can start zipping the folder. Zip the folder. We need to use the SH util package and we're gonna use the archive method. So the archive method takes three arguments, at least. So the first one will be the name of our zipped folder name. So we're gonna use zip folder name in here. And then it's going to take the extension that we want to use. So for us we're going to use a zip extension. And then here it will take the path to the folder that we want to zip. So this box here could be a little tricky. And let me just go back to our docker file to make things clear. So as you remember, we defined in our docker file the working directory, which is this one, right? So when the training is happening, when we create a Docker image, we run a container on a machine that exists on Google Cloud. Inside that machine, contain it will be running and inside that container, this is the path where our code exists and everything else exists there. So the data will be downloaded that parts. So when we are saving the model here, this temporary folder is actually created inside this folder here. So it will be user source app slash temp. That means that when we want to zip our folder, we actually have some born to that specific folder. So user, source and ten. So this code here will not run on our local machine. We will not test this on our local machine. This code is specifically made for one container is running on the cloud. So this way, this way we need to define the bat here exactly as we have defined it in our docker file. So I hope that this is clear why we are using this path here. So once the once the folder is, what we will do now is upload the data to the bucket. And so upload data to bucket. And we, here we will give it the bucket name. And for the bucket name, we still don't have it here. We don't have access to data bucket. And I don't actually want to save the model in the same bucket that contains the data. So for that reason, I want different bucket here. So let me just give it a name here. So models bucket name and this bucket name, I want to pass it to my function train. So here I want to pass it as a parameter. And just so that we don't forget. And let's go down. And that creates a new parameter here. And we're going to call it the same thing. So models bucket name. For now, let me just remove this and we will create a new bucket name here as a default. So this will be the bucket name on Google Cloud Storage for saving trained models. And finally we will pass arguments to our train function. So args, Mongols bucket name. So now we are making sure that we can get, we can get the bucket name from our parameters here or from our arguments, and then we can pass it to the train function. And in the train function we get it here. So we will use it. And our upload data to buckets function. And after that, what we want as a pet folder. And this is a variable that we haven't defined yet. We defined a name for. So we define a name for our zipped folder, but we don't have a bad dad zipped folder. We will define it here in a minute. Let me just finish the function here. So finally, we want to give our buckets or our objects inside that buckets a name. So here we gotta give it the same name as this part here. So zipped folder name. So here in order to define the path to the zipped folder, we will use as before, the user source bath here, and we'll add to it zipped folder name, then just slash here. So zipped folder name and then the dot zip. So when we create an archive, here was going to happen is that whatever is inside this folder here will be zipped. And then we will get a a zipped file or zip folder with the name, zipped folder name inside the app folder here. So when we are doing zipped folder and adding this part of the path here, then the zip folder name, then dot zip. We are actually getting the full path to our zipped folder. And that would be the path that we give it to our function here. And the zip folder name will be the name of the blob inside our bucket. So that's why we're using the same name in here. So now we are able to use a bucket and upload the data that was saved in that container that's running the training. We upload that data to our chosen buckets in here. 49. Running the final training job: So here our Google Storage, I have created a new bucket called trained models for classification. So you can do the same as just like we did before for the other two buckets. And for this bucket here actually it was created automatically by Google Cloud Platform. It contains artifacts for the container images. So after we create this bucket here, so we will use it in our code. So it's called trained models food. Let's go back here. Just before we had that actually I want to change this Soviet early stopping from five, I want to increase it to ten. So monitored the validation loss. And then we'll go down inside. For example, for this arguments, the defaults here we can use the name of the bucket just so that we don't write it every time we run a new training job. So here it's called Go back models, food classification on those who cation. So this is the name of the models bucket. And of course, this name will be passed here. And now we actually have the whole pipeline. So we are reading the data. We are actually downloading the data, putting it into a container than reading it, and training the model. And finally, saving those models for each experiment to our CB1 model, that is the best model. And then we are uploading that model after we zip the folder containing everything for that model, like so. And what I would like to do now is to change a few things and run the training on the big data set. So for the epochs, I would like to try to go a little higher. So I'm going to choose may be 20. And for the data, I don't want to use the dummy data anymore. I would like to use data because as you remember in here, for our data, for this one, we have when we enter, we have row here. We'll just go back for the storage, who data buckets. And here we have these three folders immediately after we entered the bucket. But for the dummy data bucket, we actually have this folder inside. And only when we access. That we have these three. So that's why I moved the dummy from here. And also I would like to remove the dummy from here. So the default value for the bucket that contains the data, the big data set. Now another dummy or small data set is called data bucket, right? And we check again. So food data buckets. So now what we have is we are downloading the data to local directory using this bucket name. We're gonna put it in a folder called data. After the download is finished, we're going to set this fat to the data. We will pass this path. Our training and training wouldn't run with this batch size. And of course, using the configuration file, the batch size will be changing. We will run this for 20 epochs. The learning rate will be changing as well for hyper-parameters, auto tuning. And the models bucket name is this bucket name here, trained models, food classification, the circuit again, so it's called trained models for classification. So or our model. Our models will be saved here. And this goes back to the conflict file just to check things again. So here we have max trials of eight. So I will run eight trials with MAX parallel trials. Or for, I think for testing purposes may be I will decrease these values a little because it will consume a lot of ML units, as they're called on Google Cloud Platform. And when you consume a lot of MLT units, that means you, your costs will be higher. So maybe for this, I will just use max trials and I will use utmost two trials in parallel each time. And I will keep this. So for the batch size minimum value, or for max value of May 32 be the minimum value. We can increase it as well. So this should be maybe eight. And for the max, for the mean value of the learning rate and the max values, this should be fine. And let's save this. And now let's go to our terminal here. And let's do the same thing. And as we did before, we're just going to run the same commands as before. So first, I will build the image using the docker file. So let's build it here. And again. Docker will use pre-built layers, and it will only rebuild the new layers here. So Predict copying and insulation. It shouldn't take long. Maybe while it's doing that, let's go back to the train. Adopt by just to verify things one last time. So here we are writing the classification report, the confusion matrix. We have the evaluator generator, we get the loss. We create a timestamp here using the datetime package, we create a zip folder name using that timestamp and the loss function, or the lowest value. We zip folder, we give it this name. We give it a zip extension. This is the path to the model where it's saved, because we are saving it in the same in the folder called temp. And finally, we are defining a path to zip folder and uploading data to the bucket like so. And then we report to the hyperparameters to Google Cloud AI platform for the bucket name. And the rest, it should be fine. Mist go back and see, okay, it has finished this push our enrich. So I'm going to run the same command as before, docker push image URI. And this should be fast as well. Most of the layers or the exists so because we're upload only the layers that we rebuilt. So this should be done now, the pushing on the layers, finished. Does it go back and create a new job name wood, with a new timestamp. That's done. And finally, we're going to run a new job using the command as before. We're gonna give it a name that would just set scale, basic GPU region, same region. The image URI that we just built. And we will use the copy.txt file that we just modified in here. And let's run this. So now our model or our training job has started. And we can go back here and the new job. So here's the new training job. Access it. And we remove these. Maybe he's as well. Here we have this scope back and let's open the login page. And for this shop, a tool takes longer than the jobs. If you have tested your code and your Docker image with the small dataset, then it shouldn't have taken a long time. For me. I actually done some text, some tests before this. And what I realized is that for this small data set, it takes around 13 minutes, but this was only for, I think, five or ten bucks. I can't honestly remember. But now we are running it for the big datasets and we are running it for 20 epochs. So this actually take few hours, maybe six or seven hours, depending on the type of machine that Google Cloud Platform will give us. So the machine that it will be allocated for this training job. And of course, you should lead this for few hours and jobs will start and you'll see them here. You'll see six trials. Each time, since we are running at most two trials in parallel at the same time. So now we will let this continue draining. And let's see what we can do next. 50. Summary: After a few hours, all the training have finished. And on average, it took two hours per training or per trial. So as you remember, we ran six different experiments with these set of hyperparameters. And we ran two trainings at the, at the same time each time. So on average, since each one to two hours and we ran two at the same time. That means in total, it took around six hours to finish. And as you can see, we have the loss here. And apart from that, all the models had been saved. And as you can see, when we go to our trained models food classification bucket, we get all our train models with this naming that we defined. So there's the timestamp here, and then there's the loss here. And just from here we can see that the one with the lowest loss is this one here. And we can download this model from here. And after that, we can use this model for the deployment parts and so that we can make prediction and the production level. So as you can see in this section of the course, we have started from the data set from our Docker container or Docker image that we built before. And we ran several experiments. We started with a small data set for testing purposes, and then we ran the training on the whole or the full datasets. And we were able to save all of these models here. And when I went to the logs, I noticed that we stopped for, let's say for example, this trial id1 show matching entries. And when you show, when you click on matching entries, you get only the logs of that trial. So for this one, for this trial, Id1, we have full logs here. And on average I saw that the best accuracy that we got was around 82%. So there's still room for improvement, for improvements, of course. And you are encouraged to try more experiments, run. Different trainings, may be changed. Hyperparameters even more, and even add more hyperparameters. So as you remember, you have a lot that you can play with in order to improve the accuracy. So for example, if we go to our model, when we built it, we have chosen this layer here that has 512 units. So you can even add this as a hyperparameter. And for the dropout also, you have this value here. You can add it as a hyperparameter. So you can see that there's a lot that you can do. And you are encouraged to do this if you want to improve the accuracy even more. But for the purposes of this course, we will just stop here. We will take the best model that we have here. And then we will use it for the rest of the course where we will be deploying our model using Google Cloud Platform. Just like we did for the training part. Now we will use another service offered by Google Cloud Platform to serve our model. I would like to mention one last thing before we go to the next section of the course. So when I went back and looked at the model that we built, I noticed that I didn't add an activation function before. So the model trained correctly and the lost function went down. But I always add an activation function. This is the recommended way when you build these layers. So I have added this activation layer here is rectified linear units. And I have trained a new model using AI platform. And I save the model in the same folder. So my last training job here, I said this model here. And I will be using this model when we do deployments parts. But if you did not choose a, an activation function before, and you have one of these models that doesn't have that activation function for that specific layer. Then you can go ahead as long as your loss function went down and as long as your accuracy went up. So I just wanted to mention this one last thing here. And I will be using this specific model when we do the deployment. 51. What is Cloud Run and what is Flask? (optional): Hello and welcome to this new section of the course, where we will be deploying our deep learning model on Google Cloud Platform. So for this part of the course, we will be using to new technologies. The first one is cloud run it service from Google Cloud platform, which allows us to deploy our containerized applications very simply, very quickly. And this service, I chose it for this task because I want to give you a different view of how you can deploy your applications. On the IA platform service, you can actually deploy your model there. But I want to show you how you can deploy your model as a small application that you can share with your team or with your clients to show them how the model is doing. So I chose this specific service here to deploy the model. And we will be using Docker to containerize our application. And our application. We will build it using a microwave framework called flask. So with flask will be able to build a small application. And we will have a web page where we can upload our images. And we can see those images and also we can make predictions and the predictions on that webpage. So by the end of this section and by the end of this course, you will be able to see how your model we'll work in production and how you can use it in a, in an application where you can share it with your team or with your clients. And at the end, you will be able to show them really how you went from a data set. And then you train that, that I said. And then you were able to build a small web app that can serve that model that you trained. 52. Creating the skeleton of our Flask web app: In order for us to deploy our model, we first need to create code that will basically load the model and also be able to load the images and make predictions. And we will first do this locally using the flask library, and then we will containerize it using Docker. And after that, we will push the image to Google Container Registry. And from there we can use Cloud run to run our application. But before we do this, let's set up our environment here to code the necessary parts for the deployment. So here what I have done so far is just create a new folder, and I have opened it in my Visual Studio code. And I, my new folder is called deployment code. And inside that folder, I have added this file here, scope predictor requirements.txt. I will attach this file to this lecture so that you can install all the necessary packages in one command. So these are all the packages that I have used. Two, created or add the necessary code for loading the model for the deployment parts. And in order for us to install all these requirements, let's do the same thing we did before. So let's add or less create a virtual environments using virtual M. So ritual. And this VM will be the name of my virtual environment. And now that I have my virtual environment here, and I will just clear this. And here lets go. Or let's activate environments. So source VN been activate. So now we are inside the visual environment and we can then use pip install. And we will use the predictor requirements that the TXT file in order to install all of these requirements at the same time. So requirements, the installed here, I forgotten. Predictor requirements that exceed. And now it's installing all the necessary packages. So once the packages are installed on the back and we'll see what we do or what we can do that. We now have all the packages installed in our virtual environment. And let me just clear this part here. And what we can do now is add a new file. I don't want to add it in the virtual Heron violence edits here, so we will call it predict turned up by. And this file here will contain all the necessary code to load the model and make predictions. So we can start by importing the necessary packages here. So let's go with import Flask or from Flask import many SAP packages. So from Flask, import, import Flask, then requests redirects and render template. After that, we will import a function that we can use to secure the filename, which is when we upload images, we want to make sure that the file name is secure. So we will use the package back because I'm not sure how it's pronounced actually. And we're gonna use utils. And from here we will import secure file name. We'll also import the OS package, the TensorFlow package, the NumPy package, and finally OpenCV package. And up to that, what I would like to start doing is just define a list of all the classes that we have. We will use this list later where we make the predictions. We want to make those predictions humanly readable. So we will use the same names as before. So this will be bread. And they're actually, let's just copy them from our code here. So it will be the same list. And we can keep it like this. And also for this class, last class, although I called it vegetable, but actually it also contains fruit. So we can maybe do this vegetable or fruit. And now that we have defined our classes. Start by creating the app, by giving it a name. So I'm gonna call it app here. And we're going to use Flask. So this is how we define the application before we start adding any pages by pages have been webpages to application. So this is the way that flask defines it. So we will use Flask name will be the name of our file, so predictor. And also what I will be doing is defining a static URL path. So this, but here we'll define the folder or the path where we will be uploading the images. So whenever user upload the images, they will go to this specific folder that we define here. So I'm going to use the same convention as most web developers use, which is static. And here I will create a new folder. I'm gonna call it static. So inside this folder, anyone who upload images using our application, those images will go to this folder here called static. And another thing that we need to configure the image upload. So we actually need to tell it that when the images are uploaded, they need to go here. So this static URL path is a path that, that defines all the static files where they shouldn't be. But here we need to actually define that when we have images, we want them to be uploaded to this folder here as well. So I'm going to use the app.com fig. And I will use the variable image upload. And here I'll give it the same folder that we just created. So static. And another thing that I would like to configure is the type of allowed extensions. So when we upload the images, we want to only accept some types of images and not all of them. So for example, we will accept the JPEG and also the PNG images. Sorry. These are the extensions that we want to accept. But we want to define this variable. Allowed image extensions. Allowed image extensions. So this variable here where when we configure this list, it means that we only accept images that have these types of extensions. And finally, what we want to do is we want to be able to load the model here. So once we define the application, we want to upload our read, our deep learning model. So here I will create a variable called food prediction model. And I would like to hear us to load this model. And for the path, I would like to add a new folder here. So let's call it, for example, learning model. And we will be putting our trained deep learning model inside this folder. And here we will be reading it from this folder here. So deep learning model. So in this part of the code, we should be able to create the app here using flask. And then we will be able to read that deep learning model that we will put inside this folder here. And after that, we will start maybe defining the code that will show us the webpages where we can see the images being uploaded. And later, we will be able to make predictions on those specific images. 53. Adding a helping function to only accept certain images: And now that we have this code, I have actually downloaded the model from the bucket that contains all our trained models. So I have downloaded this model here and unzipped it. Then I went to my folder that we created before called deep learning model. And I have put everything inside this folder here. So what this means is that later we will be able to store the train model in this variable here. And then we can use this variable to make predictions on our uploaded images. So now that we have this setup, let's create a helping function. So this function will make sure that the file name that we are uploading is accepted and is allowed in our application. So for this, I'm going to create a function called allowed image. And then I will give it a file name. And here we'll check whether the file name contains a dot or not. So if filename, then what I would like to do is just return false. Because this means that we don't have an image. For example, has something like image jpeg. So if it doesn't even have a dot here, that means the extension is not well-defined. We will not allow this image to be uploaded. And now if this test is passed, then what we can do is get the extension. So to get it, we're gonna use the filename that are split function. And we're going to use the dot as the splitting parameter. So anything that goes after the dot will be the extension and everything before it will be the name of the file or the path to that file. So here I'm going to use this parameter here, and we will choose the part that comes after the dot. So here, if we have a path like this, so Image.all JPEG was going to happen is that is going to split it only once. So we're gonna do, is gonna remove this part here or are going to separate this part here from this part here. And the splitting happens only once. This is why we have this parameter here, one. So even if we have multiple dots, let's say here, then the splitting would only happen once. And it will start from here. We're going to split this apart from the rest of the part here. And if we, for example, add used to here then is gonna do the splitting twice. Then there will be, for example, if we have a fad that looks like this. So file that's Image.all JPEG was going to happen is you're going to split this from this. And then it's gonna split this from this. But for us, we're only interested in the extension, which is always at the end of the file. So we will only split ones. What this means is that once we split, we will have a list that contains two different parts, this part and this part. So when we use the index one, that means we are going to take this part here. So this is how we will get the extension and we will store it. And this variable called EXT. After that, we want to verify whether the extension in the allowed image extensions. So first, we're going to use the upper function, which means it's going to turn all the extensions into capital letters. So here if we have this capitalized expression, if it belongs to the app dot config allowed, or let's just copy it from here. So image extensions. So if it belongs here, then we're going to return true. Because this is an allowed image. Otherwise or else return false. So with this, with this function here, we will be able to verify whether the image that we want to upload will be accepted or not. And after this, what will be doing as defining functions that will actually control the routes or is going to control how the pages that we will see on our browser once we run those specific routes. 54. Creating a view function to show our main web page: And now we will add that specific function. Some call it the view function, which is the function that will define what page will with the user when he or she goes to a specific routes. So first of all, in Flask, the way we define a certain route is by using the at symbol, then we're going to use the name of the app. So we call it app. We're going to use it here. And after that, we will do, we will choose the route function here. And here we have the ability to define our route. So the first route, I'm gonna choose this, this route here, which means that if we define our web app to be in localhost or, or if you even add your own domain name. So let's say for example, www dot my domain name.com or my food prediction app.com. The route or the page that I want to show will be exactly in this domain. So once you type this domain or once you go inside this domain, you will be able to see whatever is happening inside this function here. So let me remove this and let's define the methods that we will use. So for the methods, there are two different methods that we can use. The get method and there is the post method. So the get method is when we go to disrupt here and we are not posting any data or we are not adding any data to our server. So in this case we're just using a getMethod. But if we are adding data to our server, for example, when we upload images, we are actually adding data to the server. In this case, we have to use the method. And since here we want to have the ability to be able to add data. And also if you don't want to add data, you want to have that option as well. That's why I have added these two methods here. And so once we define this line here, we can start defining our view functions. So data upload image, this is the name of the function. You can choose any name that you like. I call it upload image because I wanted to define exactly what this specific page will do. So for this specific page, I want to be able to upload images. That's why I call it upload image. And Here inside the function, we can start creating our code that makes sure that we can upload the image to our server. So the first thing we need to verify is whether our requests has a method post. So what this means is that if we are uploading images, we are actually using a post method. So that means when I verify here whether we are using post, that means I am actually verifying whether there is data coming to the server or not. So for the uploading images, I know that it should be a post request. That's why I'm verifying if the post method that is used and after that's what I would like to verify is whether there are files in this request. So if we are uploading images, there should be files attached to that request. So I'm verifying whether there are files attached or not. And inside this request here, I can start by grabbing my image. So I'm gonna, I'm gonna use request dot files. And I'm going to call it image. And this will be more clear when we add our webpage, does gonna do part of uploading the image. So here, I will use this to store the image or the image, everything attached to the image. For example, we will have the file name of the image. We do just Image.all filename. You can get the file name of that image. So here what I would like to do is to verify whether Image.all filename this empty. So when someone is trying to upload an image, if he uploads nothing, or if the name of that file or that image empty, then here I would like to redirect to the same URL. So we're going to stay in that specific URL as long as the file name is not well-defined. So in order to stay in the same URL, just use requests dot URL. So what this means is that if the uploaded file or the uploaded image is, has an empty file name, then we will just stay in that page. And we're going to ask again for, for a new file to be uploaded. And after this, if this path, if this test is passed, what we need to verify is whether our image is allowed. So allowed image is the function that we defined here. And we're gonna give it the Image.all filename. So we're going to verify whether that filename is accepted or not. So if it's accepted. We first need to secure that filename. So I'm going to use a variable called filename and the function that we imported. So secure file name, image dots, filename. As you remember, this function, we imported it here. So now that we have a secure file name, we can go ahead and actually save that image to the folder that, that is supposed to save all our images. So if you remember, we defined this image to be this static folder. And the static folder is empty for now. What we want to do is to upload that image to that folder. So I know that to do that, once we secure the file name, we can use image dot save. And then here I would like to do oh, as.factor join. And here I would like to give it the bad. Oh, our image uploads here. So this, this path here. And I would like to join the secure file name to this path. So this means that once the image is uploaded, we're gonna secure its name. And then we will save it inside the static folder in here. And after this, what I would like to do is to redirect. So redirect page to another, to another page where we will be doing the prediction. Because the plan here is to have two webpages. One will do the upload of the images, and one will do the predictions using the model that we build it here. So for now, I have not yet defined a second page, but we can start at least giving the routes to that specific page. And here I'm just going to use an upstream. And I will suppose that my my Web page that shows the image will be called showing image. And then what I would like to have it, the filename and my URL. So just so that I can know that this image here corresponds to the image that will be shown on the web or on the browser URL. And again, all of this will be much more clear when we create our webpage. And once we, once we, once I show you how all of this looks like. So here. Sorry. So we did the f1 L. So this means is that My image, my file name, or the name of that image is not allowed. What I will do is stay in the same URL. So I will just do the same thing as we did before. So return redirects, and I'm gonna give it through quest dot, URL. And finally, what I would like to have here. And you just go out to this part here. So the first time we go to this URL here, we should see a webpage data we create. And this part here, the post part, will only happen when we are uploading the image. So the first time we visit the URL, we will not have a post method. It will be actually a getMethod because we are just visiting the webpage. We're not uploading images to it. So that means that we need to have a webpage that shows us what's happening and shows us the buttons that we can use to upload the image. And for that, we will use the function render template. And this function, we'll take a name of a template. So I'm going to call it, for example, upload images dot HTML. So this will be an HTML page, HTML file that defines how own web page will look like. And for projects in flask, it looks for HTML templates in a folder in our directory here. So I'm gonna create a folder and it always looks for a folder called templates. So this means that inside this folder here, we can define our web page that we call it upload images dot HTML. And for now it's empty. But here what this means is that the first time you visit this URL here, what's going to happen is that it won't be a post method, it will actually be a getMethod. So all of this part of the code will not, will not execute. This means is gonna go straight to this code here or this line of the code. And what this line of code does is that it renders this template. And the template is called upload images that HTML. And this flask by default looks for this pile inside a folder called templates. That's why we have created a folder called templates. And only then, and only inside this webpage here, we will define the buttons that will allow us to upload the images. And once we click those buttons, this part of the code, so the post part of the code will execute. And only then we will go through all of these conditions here. And we will be able to upload our images to this folder here. 55. Quick test to verify that everything is working properly: And now there's just add parts of the code where we can do a quick test to see whether at least we are able to show a webpage or not. So for flask, in order to run an application, we actually need to have this part here in our code. So if name equals main, and here we will run our app, app. That trend. For now the said the debug to true. And we're going to use the host. So on localhost and port, we want to use and the environment variable to define the port. So OS environment gets. And then we will define a variable called boards are going to use the 8080. So what this means is that when we run our Flask app is going to use this host here. And it's going to look for a variable called form. Now we can define anywhere. So even if we containerize the application, we can still define the ports 8080, and then our application will run on that port. And another thing I want to add in order for us to make that test is to go to our templates here that we created. And let's just add some boiler plate code. So HTML5, you click, HTML5 is gonna give you this boilerplates. And this is defined, for example, the title of our webpage as uploading images. And all of these lines here are just necessary codes for HTML files. And in the body. For example, the find, or just write a sentence here. And discourse, for example, upload your image. Upload your image. So what we want to see is when we run our application and we go to disrupt here, we would like to see a webpage that has the expression upload your image. So I just want to do this before we move on to the next steps of this deployment parts. And for now, I'm just going to comment this part of the code for reading our deep learning model because we won't be making any predictions with it right now. So let's just comment it. And now that we have this, we can just go to our terminal and run by a predictor that by. So in this manner here we can do quick tests every time we make new changes. So I just want to see if things are working properly so far. And if they are, then we can go on and build the rest of the half. So for now, the app is running, as you can see on the localhost 8080. And what we can do is maybe copy. Or let's just open it configured cluster. Just open this URL here. And as you can see, we have upload your image expression here. This means that Flask app is working properly so far. So we go this route here. So when I, when I say route, that means that when we go to the main web page of the application, we will see everything that we defined in this file here or in this template here. And as we can see, everything is working properly. So now we can go ahead and finish our application and at least depart for uploading images. 56. Finishing the main web page: And let's now finish our web app or our webpage here, so that we will be able to upload images to our application. So let me just stop this running server per now and include the terminal and here under this line here and ludus expression. Actually I want to make it smaller. So for a header for here, and let's define a form. So the form as the part of the web page that allows us to upload images or files to our web app. So when we define a form, we have the action for now we don't need the action. We will define the method to the post method. And here, since we are uploading images, we need to define a variable here called Ink type. And we're gonna give it multi-part form data. This is important so that we can upload files to our server. And after this, what I would like to define is maybe just a label. And we'll call it for example, select page or select image. Select, sorry, image. And beneath this, what I would like to define is the part of the code where we actually upload that image. So here I'm going to define an input. And it will be a type file because we are, we will have a file here. And for its name. I'm gonna use image and for its ID and then a use image as well. So when I define this input here and I give it a name image, this is y here. When I am grabbing the image, I'm actually using requests that Files. And then I am using the image name or the image expression here, because it's the same thing that we are using here as an input. So maybe you had this insight. He did. So just so that the page will look cleaner and that will not be adding any CSS to our code. I want to keep it as simple as possible. So I'm just going to add this here. And finally, what I would like to add is a button. So this button will actually submit the data once we choose it. So we will have two buttons. The first button will allow us to choose images from our local folder. So from the machine. And this button here will allow us to actually upload that image to the server. So when we define the button that's defined the type to submit, this means that we want to submit data to the server. And let's give it a name. Let's call it upload. So for now we have this. And let's try to test this and see what we have so far. So let's go back to our web page. Let's refresh it. And as you can see, we now have these two buttons, choose file and upload. So this first button here, choose file is coming from this line here, input. And the second button that's called Upload is this one here. So let's try and run or click on the choose file. As you can see, it gives us the ability to upload images from our local machine. Let me just go back to show you. So when you click Choose file and get access to your local machine and you can upload any image. And for me, let me just choose this. So when you choose an image, you see a name right next to this button. And then when you upload is gonna upload the image server. But if you remember, if we go here, once we upload or once we save, the image was going to happen is that we need to redirect to this URL here. And this URL is still not defined. We haven't defined any view function that goes to this URL here. So if I click upload is gonna say not found because it's kinda tried to access this page here. And it's not going to find it. And let's go back here. And if we go to the static folder, you will see that the image dad who just uploaded is found inside that folder. So now that we have this, let's start coding the rest of the web app. And let's first try to create a view function where we will be able to see something when we access this URL here. 57. Adding a web page for viewing the uploaded image: So let's start adding the second view function. Let me just collapse this and go here and let me just stop the server from running for now. Let me clear this. And here I'm going to create a new view function. We're going to take us to the route that we defined before. So if you remember, we have disrupt showing image and then the filename. We want to add a new function that can take us to that specific URL. So for now, let's do the same thing as before. So app.use route. And here, what we will do is using the same routes as we defined in the previous view function. So showing image. And here, what we can do with Flask is that if we add these two symbols here, inside of them, we can add a variable name. So let's call it image name. And then what we can do is that inside our view functions defined showing image, inside this view function, we can use the same variable that's passed to our URL. We can use it here. So if you call it image name is going to be the same. So image name here. And inside of the function, we can use this as a variable and we can have access to whatever we pass in this part of the URL in here. So let's just finish the route definitions. So add the methods and the same. And before we get to add getMethod and the post method. And here. So of course we will have if request.me equals post. So if we have a post request, we're gonna do some, some processing here. So now that we are in this webpage, Why would we need a post method? You might ask? And the answer is simply because in this webpage here we will be doing, doing the prediction. So we will be adding a button that once you click, the server will run the training model and tried to predict on that image that you just uploaded using the previous webpage. So for now, let me just leave it empty. And what I will be doing as just returning a render template and define a new template in our templates folder. For now, let's call it showing. Showing. Image dot HTML. And for this, we're gonna give it a value here. So value will be the image name. So in Flask you can actually define variables that you can pass to your template. And then you can use them inside your templates to do all sorts of things. And you can define as many variables and you can give them whatever name you like. So here, the first variable that I defined is called value, and I'll be passing the image name that I get from the URL. I will be passing it to my templates. But if I want to add, I don't know any, anything that you like. So let's say you want to add some sort of a counter and you want to give it a value here. Then inside your templates, you can have access to a variable called counter and it has a value of four. And if you are doing some processing in here and this value changes, then you can pass it to your variable called counter or whatever you like, and then you can use it inside your template. So for now, I'm just going to pass this image name here that I get from the URL. And let me define, or let me create a new file. I'm going to call it showing, showing Image.all HTML. So the same name as in here. So showing Image.all HTML. And there's do HTML5 creates a boilerplate here. And here, what we will call this webpage is maybe prediction page or running prediction, running for addiction. And the same thing. Let's just do a quick test. So I'm going to call it here running prediction base. Just to verify. And now I have this and I confess this p-value to it. As so let's, let's add here a part of the code that shows us the image. And also, let's add a button at the end of the page. With this button. The purpose of this button will be to tell the server to run a prediction on our uploaded image using our deep learning model. So here in our webpage that's defined like before, a form. And let me just create this inside my form. I would like to define the method again. So it should boast method. And the same thing as before. Want to part from data. This part here I want to be able to see my uploaded image. So let me just add a div in here. And I can use the image tag in order to be able to upload an image. So if you want to upload an image and show it on your web page in HTML, you can do with, with this tag and using flask. What we can do is define a URL. So let me do this. So inside these brackets, we can get access to variables coming from from our template. So we can have access to this. And also we can have access to functions defined are predefined in Flask web framework. So I'm going to call it, or I'm going to use a function called URL for inside of it. I'm going to define these static as. I want to be able to read files inside my static folder. And for filename, how we'll give it the value. So this is a function that takes these two parameters. The first parameter defines the folder or the name of the folder where we will be searching for our image. And the second parameter is called filename, was going to say the name of whatever file that you want or whatever image that you want to read from that folder. So now I hope you understand the logic here. So when we click on the upload button, in our view here was going to happen is that you're going to take us to this new webpage. And in this new web page will be passing the filename to data URL. And when we try to access this URL here we're actually using the second view function. And in this new view function, we're going to take the image name facets here as a value. And then when we go to showing image template, we will pass that value to our filename in order to see the image in our, on our webpage. And finally, I would like to add an input here of type, submit. And the value is whatever label you want to put on that button. So I'm going to call it predicts. So what I would like to have the image and beneath it, I want to see a button called Predict. And one, I click this button, I would like to have a prediction using our deep learning model. Let me just save this, go back here. There's just verify and maybe do a quick test before we go any further. So I'm going to run Python predictor that pie. We have our server running. Let's go back here. And let's go back to this page, the first page, and let's upload a new image. I'm going to click upload. And now as you can see, when we go to this new web page, we see the expression that we defined in here. So running prediction. Beneath it, we see the image that we just uploaded. And then we have this button here predicts. For now it doesn't do anything because we haven't defined in our code what happens when we post data to our server? So as you can see, I just added as here, but what we need to add the necessary code for changing, for example, the size of the image and preparing it, then passing it through our model and then getting the prediction and seeing that prediction. 58. Finishing the web app and testing our code locally: Let's finish up the part of the code that's going to allow us to make predictions on our images, such as this one here by clicking on this button. So now so far when we click on this Predict button, we don't see anything. But when we finish the part of the code here that makes the prediction, then ideally what we want to have is the prediction shown on the webpage. So let me just stop the server for now and just clear the terminal. And here the C. Okay, so this part for reading the model is still commented. And for now, let's start finishing the post part of our webpage here. So the first thing we want to do is to get a full path to our image in order to do data than a US OS dot, fat, the join. And we're going to use the image upload path here. So it should be here. And then to this, we will add the image name that we pass to our URL. So in this manner here, we are actually constructing a full bath to our image that should be in our static folder. After that, well, we can do is read that image. So image. So we're gonna use Open CV library in order to read this image, which will be Cv T2 dot and read. And we're gonna give it the image path. So now our image is stored in a variable called images. And what I would like to do now is just to copy that image. Because if we start making changes to this image here, using this image variable here, this means we are making changes to our original image, but what we want to do is just make a copy of our image and then do all sorts of manipulations and the image processing on this new copied image. And now what I would like to do is store or to convert the color of my image. Because one OpenCV upload images, it actually upload them in VR formats. So this blue, green, then red. So these channels are flipped, the blue channel and the channel. This is how open CV uploads and reads the images. For us in our network when we trained it, it was actually trained on images where the red channels was the first one. So it was an RGB image, this y here. What we will do is convert. The colors of this image to become RGB images that VR. So let's use the same variable as before, because we won't be needing this variable anymore since we made a copy here. And we're going to use CV to convert color. And we're gonna give it our copy. And then we're going to define this variable here. So now it should be color, actually color RGB. So now what this means is that our image here, the channels of this image will be converted to RGB channels. So the channel R and B will be flipped. And then we will store, stored this converted color image and store it in this variable called image in here. And after that, as you remember in our code for the training, we actually apply some transformations to our images when we pass them to the network. So for training part, we were doing so many augmentation techniques. These are techniques that help our neural network generalize well. But in the validation part and the evaluation part, as you remember, we only did one type of of transformation. And this is an important transformation because the inception neural network actually accepts values between 01. So this is something that we have to ask. I saw obligatory to add it to our code. But these ones, we added them only for training. So there are two things that we need to make sure that we have during prediction. The first thing is that our images need to be re-scaled so that their values or the values of the pixels will be between 01. And the second thing is that our images should have a size of 229 by 29, by three, of course. So these are the two important unnecessarily transformations when we do the prediction part. So let's go back to our code and let's add these transformations here. So we're going to use the same variable image, and we're gonna use c v2 dot resize. So now we can resize our image. And we're gonna give it a 229 by 229. So now we are making sure that our image is in the correct size. Through the network. And another thing is that we will change the values or the type of the values of the pixels. We will use float 32-bits type of values because after this we will be doing the re-scaling. So if D type of the pixels is unsigned integer, then when we divide by 255, then it could go to 0. And it will not give us the correct, the correct values that should be between 01. So here what I will do is I will do, I will transform this image using as type. And for the Type, I will give it float 32. So this will be the type of my image. And only now I can do this, rescale it. So I will divide all the pixel values in my image by 255. So again, if I did not add this line here, then the values of the pixels could all of them go to 0 because the type of the values at this level here are unsigned integers. So they can take only values between 0255. And they are, they are integer values. So if we divide them by 255, they will go to 0. So this is the important part here. And finally, we would like to add here, or what we need to add a dimension to our image. Because when we pass images to our network, it actually expects them in batches. So it expects an input that has four dimensions. So this is what we will do here. So I'm going to call it and PImage just for numpy image. And I'm going to use num pi that expand demo. And I'll give it the image. And the new dimension will be in the level 0. So in the axis 0. What this means is that if I have an image of the, say, 229 by 29 by three, it will become something like this. Two will become 1229293. So the network expects inputs in this format. If you want to, if we want to pass only one image, then this should be one. If we're going to pass multiple images, then we can back them. And we can find them in one tensor and we can add, we will have to images, and then we will have the actual images in the other dimensions. So for us we will be passing only one image. So when we add the dimension. A single image data looks like this, will become like this. And this is the input to our neural network. After that, what we would like to do is get the predictions. So far the predictions, we want to use our model. So let's go back and remove this comment here. I'm just gonna copy this, go back down and put it here. So when we do the predictions, we just need to pass our image. That was, that's in this format. We need to pass it our model here, and then we will get the predictions. And after we get the predictions, the predictions are basically probabilities for this image to belong in one of the categories that we defined here. So we will have 11 different probabilities. But what we would like to have as the class that corresponds to the highest probability. In other words, we want the network to tell us what does, what does it think that this image belongs? Doesn't think that this image is an image of, of broad-based food, is an image of a desert based food. So the way we do this is by looking at the predictions and then taking the probability of the highest class. And based on that highest class, we can choose one of these classes here and print them to the user to be able to see them. So here I'm just gonna get the predicted class index. And for this, I'm going to use NumPy dot R max. And I will give it our predictions. So what this means is that we will get the argument of the highest value in this array here. So if we have, for example, an array of, let's say 0.1.50.3. What we will get is since 0.5. is the highest value here, what we will get is one, so the predicted class index is one. And then what we would do is take that index and go back our food classes here and choose the corresponding class. So here. Now that we have the predicted class, less also gets the probability. Probability. And for the probability we're just gonna use NumPy. Umax predictions. We can also get the probability by accessing this predictions array using this index here. You can do that if you like. And now that we have this, thus gets the predicted class. So for the predicted class. We will use our array or our list that contains the classes, and we're gonna give it the predicted class index in here. So this variable here will store one of these values. And based on these values, we will print them or we will show them on the web page that the user will see. So for the webpage, now we actually have to add a new Web page that shows the prediction results. So for this, let's first start by rendering the templates here. And we're gonna give it the name of our page. So let's call it prediction result dot HTML. And let's create a new file here with the same name. So prediction result, that's HTML. And let's go back here. And for this template here, we actually need to give it several variables. So let's give it the image path will be the image name. And we're going to give it predicted class so that we can show it on the page. So our predicted class is the same as predicted class. And of course you can use the same names here and here when you define variables that you're going to pass your template, you can keep the same name if you like. This. This is completely okay. So predicted class predictors class. And finally, pass the probability problem IL of t. I'm going to give it a probability as a value. So now we have this part of the code where we are changing the image size and re-scaling it and also passing it through the network. And we're getting the predictions. We have our model being read when we started the application. So here we just need to finish this prediction results page. So HTML5. And it starts with the same template or boiler plate as before. And the scold this prediction results for this page and give it. And for here are how they're poor and as call it prediction results. And beneath this, we want to see the image and also see the predictions. So we're gonna create a div in here. Inside the div, I'm going to add the image that, that was uploaded. So just like before, we're going to use URL for and I will give it the status for the name of the folder containing the images. And for file name. I'll give it the image path. Actually this shouldn't be imaged path but does go back. Maybe we want the name to reflect exactly what this does. So maybe call it image name as well. Let's go back here. And finally, what we want to have as the prediction results. So the predictive class and also the probability of death class. So for this, we're just gonna print or show this on the screen. So predicted class is, and I'm gonna give it, I'm going to use the same variable that we defined when we added this here. So the predicted, the predicted class should be this one. So I'm gonna take this and I will give it, or I'll put it here. And for the probability we're gonna add this head would draw from the ability. And the same thing. We're going to add it here, probability. And this should be it. For now, when we upload the image, we will be able to view that imaged and click on a button called predicts. And then we will get the prediction corresponding to that image. So let's do a quick test. Let's do OK, now we have a problem here. Return, we direct line to line. So the problem was just here. I think I made a typo, I accidentally remove this line and I had added something here. This was by mistake. So now it's fixed in this part of the code, the allowed image part. So we're taking the Filename, we're secure in it, and then we are saving that image to our folder static. And then we are redirecting to this new web page. So I have started the webserver here. And let's go back to our application. Let's just refresh this and choose a file. For example, this one or this one. And that's uploaded. Now we can see this page that contains the image that contains the button here. But now, when we click on the button, predicts what we actually get is a new webpage that almost looks the same. So we have predict results. We have the image and beneath we have this expression he had predicted class, which is the correct class with probability 0.95. So as you can see, now, we can even go back here, choose maybe a different class that CO2 deserved. And that's uploaded. Acyclic predicts. Well this class, actually the model does not predict it correctly. It predicted it as egg, but the probability is very low. I mean, it does look a little bit like an egg. So I understand why the model might make that mistake, but it's good that it wasn't too sure. So the probability was very low. Let's choose maybe maybe a different class, something that looks like a desert. And let's upload this predicts. And now as you can see, the predicted class is desert and with probability that's very high. So the model is very confidence that this image is an image of a desert. So as you can see now, with our trained deep learning model, we were able to build a very basic web app that can upload the images and also make predictions using using these images and using the model that was trained. And what we want to do now is to Docker eyes our application and then run or deploy this application on Google Cloud Platform using Cloud run. 59. Using gunicorn to serve the web app instead of Flask server: Another thing I want to mention here is that when we are doing our tests, so we are doing Python than predictor that buy. And when we run this code, we are actually using the Flask server. But in production this is not recommended. So if you want to deploy your app, like we will be doing after some time. If you want to deploy that app in production, then you don't want to use the Flask server. It's not recommended And they even tell you when you run this command here. So let me just maybe go back and just comment this for now so that it loads quickly. So let's run this. When we do this, we are running a flask server and it tells you here this is a development server, do not use it in a production deployments. So for our case, since we want to deploy this in production, we have to use a different web server. And for this, we actually will be using unicorn. So you should already have it installed in your environment because it should be included. Requirements.txt. So let me just look for it. So unicorn here is installed. So we will be using unicorn as a web server to serve our Flask app. And if we want to test locally using guinea corn, what we need is we need corn and then dash b, then you're going to use the local host here, and then the boards that we defined. And finally, we're gonna give it the name of our file that contains that application and also the name of our app. So it should be predictor because our file that contains the app is called predictor. And then we're gonna do, we're gonna add app here because the app name is app. And if we run this, we should have the same thing as before, except that now our application is running on a production server and not on the flask development server. And if we go back to our application, it's still running in the same server. And let's come, let's uncomment this part here and run the application again. So let me clear this. And then I'll, I'm unicorn here. And now it starts reading the model. Let's go back to our application. Uh, me, just refresh this page. It takes some time into beginning goes, it is reading the model. As you remember, we are reading the model. Once we instantiate our app here, let's go back now. It's running as choose a file. To write this file here or this image that's uploaded. And when we predict, as you can see, we have the predicted class as rice. And maybe just one thing I would like to add in our prediction result here. I would like to wrap this in an H three. So just do this. This. And finally, here, let's save this. Go back here. And now this should be a little bit bigger, a little bit. Running. The prediction can just taken a little longer. And now we see that the expression here is much more clear once we use the header of level three. So as you can see now we have a web app that uses flask and it's deployed on a production server using unicorn. And finally, now that we have all of this setup, we will dock rise our application and then deploy it on Google Cloud Platform. 60. Dockerizing our code: Now this containerize our application or less doc arise it i, which means we will create a Docker file that allows us to create a Docker image. And then using that Docker image, we can run different containers that are running our application. So the first thing we need to do is add a Docker file. So docker file here. And inside this Docker file, we will add the necessary lines that allows us to create an image for our web app. So first, we will use the from command. And here I would like to start with TensorFlow as the base image. And I'm going to use latest has a tag here. So using this page based image, we will define everything and install everything on top of this base image. So let me first define the work, the year or working directory and do the same as before. We're going to use User source app as our working directory. And then what I would like to do is copy everything is in my current directory to the working directory for my for my Docker image. But if we do this is going to copy everything like DVM folder and the bike cache folder, for example. And we don't actually want to have this. So what we will do is add a Docker ignore file. And inside this darker ignore file, we will add lines of things that we want to ignore when we are trying to build our Docker image. So let's go back again and call it Docker. Ignore. And then here I'm going to add everything that I want Docker to ignore when it tries to build an image. So here we're going to add the static folder as well. And this is some artifacts that I get when I created python projects, so I'm going to ignore them as well. Also VS Code. These are some artifacts from VS code that I would like to ignore. I also want to ignore by cash folder that contains cash from Python. Also like to ignore this file here. So docker ignore, I don't want to copy it. And now we can go back to our docker file. And inside we can continue creating the Docker file. So the first thing I want to add is to create a folder, an empty folder where the images will be uploaded. So folders should be added here. So users source app, this is our working directory. And I would like to create a folder called static. The reason for this is that I am ignoring it here because I don't want to upload these images to my Docker image, but I still need a folder called static, so that when the user wants to upload images, then those images will be stored inside this folder here. So this command here is going to do exactly that. And then what I would like to do is install the necessary packages for running our applications. So run pip install. Thus the person I'm going to upgrade. And then I'm going to use pip install. And for this, we will need, we will need flask, green corn for the server and open CV, Python. And finally the command that we want to execute. So in Docker file you can use entry points as before. So if you remember here when we use this at the end, we used into points and we ran this. You can either use point or Cmd for the command to run your application. So for this specific example, I'm gonna use cmd command. Exactly. And I will run my application using the same command as I did when I run it locally, maybe just with a few modifications. So here I'm going to use dash, dash. And I want to choose the same port as I defined in my application. And then I can define the number of workers. And also threads to be eight, for example. And what this allows us to do is this is some parameters that unicorn can use in order to run the application smoothly and in order to make it run faster. Finally, we're going to add a timeout to be 0. And then we gonna add predictor. So the same thing as we did before, except for these new parameters here, but this command will do the same thing. So whichever or wherever our application exists. When Docker we'll run this command here is going to run our application. So this is a simple docker file and this should be all that you need. But since I have already tested this, I realized that open CV Python actually needs some dependencies. That without them is going to give you some errors. So for these dependencies, I have already solved that problem and what you need to do is actually installed them using this command. So run APT, yet, update. And then. Apt yet install dash y. And then these are the dependencies that opens. Open CV will need. So live GL, one, Neizha that dev. So when, if you don't add this line and you can try this on your own, maybe you can ignore this line and just build your app or build your Docker image using the code without this line. And you realize that the part of Open CV, so when, since we have OpenCV used in different places in our code, is going to crash because OpenCV will need this dependency here. So now that we have this, what we need to do is build a Docker image. And for this, now we will not just use Docker, we will actually use Cloud build service offered by Google. So if you go to Google cloud or cloud dot google.com slash sram slash docs slash quickstart slash, build and deploy. And you can choose Python. Here, gives you a guide to build your Docker image using Google Cloud. So now let's start a new terminal window here. And let's cd to our projects or to where our code exists. So let's copy this copy. And then here we're going to base layer. And of course here we have Docker file that we need for building our Docker image. And we will be using a service cloud builds. So first of all, let's verify that we are using the correct project. So Cloud config gets value projects. And here as you can see, we're using the same projects that we created when we started using Google Cloud Platform. You can create a new project if you like. But for me, I'm gonna be using the same project as before. And now what I want to do is define a project id. And I will keep it. This value here will make it easier for me to, to define the image URI. Let me resize this. So now that we have this, I'm just going to be doing the same thing as they did in those steps. The second thing that I want to do is define an image URI variable. And what I would like to store in it as cr dot IO. And again, you can you can store or you can choose a different Google Container Registry to store your, your Docker images. So for example, there's the EU WCR. I think there's Asia that VCR. And there's also the US CR. And you can go back to a previous video where I covered this. But for us we're just gonna continue a G-C or that's IO. And for this cr dot IO than project ID. And then we're going to call it, for example, maybe food predictor. And now we have our image URI. So this is our image URI actually shouldn't be on this project id did not read it correctly. Id project id still here. So why not? And I made a mistake. So missing project id. And let's check again. And now we have the full image URI here. Let's clear this. And then what we can do is run this command here. So G Cloud builds submit, dash, dash, and then our image URI. So now what this will do is that Google Cloud, we'll look inside this folder here, and it's going to look for a Docker file. And based on that Docker file, is going to create a Docker image. And it's going to push that Docker image to Google Container Registry. And this will take probably some time. So I'll leave it, finish this process and come back once everything is finished. 61. Deploying our web app to Cloud Run: And now after some time, the image or the Docker image was built. And we have the whole pipeline of building that image and our terminal here. So as you remember, first started by creating a tar balls, so zipping the folder and upload it to Google Cloud Platform. And then here, of course, it started building the Docker image using the commands that we put in our docker file. So it step-by-step. And at the end, it successfully built the Docker image. So this is the last step, successful and built Docker image. So now it's started pushing the Docker image to Google cloud registry. So now we should have it in our Google Cloud registry. If we go back to our dashboard here and we look for container registry, has go to continue Registry. Here we can see our app that was our Docker image that was built and sent to Google Cloud, Google Container Registry. These two are actually some tests that I have done on my own, but this is the app that we built today. And now that we have it here, we can continue with the command so that we deploy our Docker image to Google cloud to cloud run so that we can have access to this application or service from anywhere. So now, let's go back to our terminal here. And we have this lets clear the terminal down. And then we can run this command here. So G Cloud, run, Deploy. And then VCR, IO, or maybe it's just our image URI. So image URI. And here they say lot form, manage. This means that we want Google Cloud run to manage everything. We don't want to set up things. And let's run this command now. And let's see. So it's gonna give us a default service name. It called it food predictor app. I'm going to accept this. If you don't, if you want to change the service name, you can add the name here. For me. I'm just gonna use food predictor app for the region. I'm going to continue with the same region. So US central. Although I am still in Europe, but I'm just following the tutorial showing you that you can set up your configurations in whatever manner that you like. So I'm going to use ten here. To make this default region g, Okay? And now allow unauthenticated invocations. Lets say yes. And that was going to start the deployment process. This could take some time as well. So I'm gonna be back once the process has finished. So it only took a few seconds, less than a minute, to deploy our app. And now our app is actually served on this URL in here. Before we go here, let's go back to Google Cloud Platform. Let's search for Cloud run. Let's go to Cloud run here. And as you can see, our service served in here. It's running here. Let's click this service and see what we get. As you can see the URL where this server is running as this one. So let's access this URL in here. And I want to show you something that I faced when I first tried to serve or to deploy this application. And then here in the cloud run dashboard, if you go to the logs, you can actually see what's happening on that machine that's running the service of ours. So as you can see, we have all of these logs in here. And if we go to our application, you'll see that it says server error. Server encountered an error and could not complete your request. And the problem for this and why our application was not deployed correctly. You can find out that problem if you come to the logs. And on the logs you can see this line here or this one here. So it says memory limit of 256 megabytes exceeded with this number here. Consider increasing the memory limits in order to make the application work properly. So for me, I have changed this memory limits several times to make sure that it's working properly. So the value that I found worked best was two gigabytes of memory. So in order to change that, if we go to this page here, configuring memory limits, command line, and you can actually get access to data. Dad just by clicking on that error here. It says this page here. So this is the same page that I am looking at it here, configuring memory limits. And if we go down, it says you can update the memory allocation of a given server service by using this command. So this is our case because our application or our service is already deployed on Cloud grants, so we just need to update it. But if you know in advance how much memory your application will need then at the time of deployments. So when we ran. And this will back. When you run this command here. We could have actually added an attribute memory and we give the size that we want. But for us, since we have our application already running, we just need to update the service. And for this, we will use this command in here. So G Cloud services updates and the service that we want to update this cold food predictor app. So let me copy this and paste it here. And then I would like to, I need to add this attribute here, the memory. So as you can see as you cloud services update the name of the service than memory. And here I'm gonna give it 2.0. gigabytes. And it says he replaced service with the name and decides with the desired memory size. The format for size is fixed or floating point number, followed by a unique corresponding two gigabytes or megabytes or kilobytes. So for us we want gigabytes. So that's why I've put 2.0. and let's run this command showcases your target platform, which is just going to use the same platform. So Cloud run, fully managed. And the region again. And now we're deploying the service again with new memory size that we have configured. So let's weight might take some time or it might be fast. As before when I deployed it the first time, so fast. And now we have our app running on this, on this URL in here. So this URL. So let's refresh this URL and see what we get. And maybe let's go back to the logs and see. Ok, at least now, okay, starting screening. The model here. Running. So far, we don't have access to the web page yet, but let's wait a little longer. So at least we're not getting the errors that we had before where it told us that it needs some more memory. So now we have, as you can see, the application running and let's test it here. So as you can see when we click, we can upload an image. Let's upload this image for example. And now it's uploaded. And we can see that we have the same thing as before, and we have the same URL structure as before. So this is the URL where service is being served. And this is the part of the URL that we defined in our code. And now we have this image. We have the expression here, we have the button predicts. And if I click on predicts, now we should run our model prediction and tell us what the model thinks this image belongs to. So it says predicted class is seafood with probability 0.99. So as you can see, our model has been correctly deployed on Cloud run and our web app is functioning correctly. And now, the good thing is that you can share this URL with your team members or with your friends. And they can do the same thing with that you're doing. So they can upload images and they can test your model, and they can see how your model is behaving by themselves. 62. Summary: So now we have reached the end of this section of the course, which is the last section. And in this section what we have seen, how we can use some technologies like Flash, unicorn and Google Cloud run and also some other technologies like Google Cloud build. In order to start from a code-based like this one, where we define the different webpages that we want our web app to have. And then we created a Docker file that was used to create a Docker image. And that Docker image was pushed docker or Google Container Registry. And using that, we went to Google Cloud run and we actually deployed our web app on Cloud run. And now we have our web app serving live. And you can share this app and you can share your work with your coworkers or with your manager or whomever you like. And as you can see, that can test it with any image that they like. So let's go back and maybe do another test here. So less. Sorry. Let's check four. Maybe this one and this upload the image. And this predicts. So it predicts it correctly, maybe with the lower probability because we can't be 100% sure whether this is bread or maybe it looks a little bit different. Maybe it looks a little bit like a desert also. So we can see that our model is behaving really good. Let's go back and check fried food. Let's upload it predicts. And although it is actually fried food, but it was predicted as deserved. But we can understand this. Our model is not a 100% accurate. So remember we, when we trained our model, it has around 82% accuracy. So from time to time, a tool make mistakes. But the good thing is that when it makes mistakes, so far we see that it's logical because this does look like a dessert as well. So it does think that this is a desert. So maybe one thing that you can do is to add the second and maybe third classes that were predicted. So the second third classes with the highest probabilities and showed them. So maybe that a better idea about how the model is looking at these images. So maybe the second best probability or the second-highest probability will be that of fried food. Let's go back and do maybe one last desk. And the noodles. They choose this image that's uploaded. And now it's taking some time. It should work just like before. So another thing I want to add while we're doing this is that google Cloud run. Actually, you only pay when you are running your application. So if your application is not running, then you won't be paying anything. So when I say application is running it, that means when you are doing invocations to your application. So if you're not doing anything, if your application is just sitting there, then you won't be paying anything, which is a great thing when when you don't want to waste a lot of money and when you want to make sure that you don't spend a lot of money on on one specific service or a small micro service like the one we are building here. This one took some time. Weight's okay, maybe they just stop this. Just go back to the beginning and do another test because this test is taken a longer time than expected. As see what we have in the terminal or the logs. So also depending on the machine that's running your application, it could be, it could be slow from time to time. And you have many options to change this. You can edit and deploy a new version with more configurations. There is a lot that you can do with Cloud run. For this specific course. I met, I left the details and I only showed you the main parts that will allow you to go quickly when you're building your deep learning model and then deployed on it. So I do encourage you to go to the documentation and see more of what you can do. So here, the application is working as predicts. And you can see it correctly predicted the class. So it's noodles FASTA with probability 99 or 0.999. So I hope that now you have a better idea of how you deploy the application after you train your deep learning models. And this should be it. And this is the last section of the course.