Computer Vision with Deep Learning and OpenCV: Learn How to Detect Smiles | Yacine Rouizi | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Computer Vision with Deep Learning and OpenCV: Learn How to Detect Smiles

teacher avatar Yacine Rouizi

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

7 Lessons (55m)
    • 1. Introduction

      1:53
    • 2. Installation

      1:33
    • 3. Loading the Data

      17:15
    • 4. Training the Smile Detector

      14:41
    • 5. Applying Our Smile Detector to Images

      12:58
    • 6. Applying Our Smile Detector to Videos

      4:56
    • 7. Conclusion

      1:37
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

21

Students

--

Projects

About This Class

In this course, we will be creating an end-to-end application that can detect smiles in images and videos.

For that, we will use deep learning and start by training a convolutional neural network on the SMILES dataset which contains faces of people smiling and not smiling. Once the network is trained, we will go through the following steps to detect smiles in images and videos:

  1. We will use Haar cascades to detect a face in an image.
  2. We will then extract the face region from the image.
  3. then we will pass the face region to the network for classification.
  4. And finally, we will annotate the image with the label "smiling" or "not smiling" depending on the output of the network.

This class was made for intermediate Python programmers that have some familiarity with deep learning and computer vision.

By the end of this course, you'll have a fully functional smile detection application that you can use in your own projects.

Meet Your Teacher

Hi! My name is Yacine Rouizi. I have a Master's level in physics of materials and components and I am a passionate self-taught programmer. I've been programming since 2019 and I teach on my blog about programming, machine learning, and computer vision.

My goal is to make learning accessible to everyone and simplify complex topics, such as computer vision and deep learning, by following a hands-on approach.

See full profile

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Welcome to the class might detection with deep learning, where we will be creating an end-to-end application that can detect mines in images and videos. My name is the arsine and I will be your instructor in this class. I've been programming since 2019. And I am the author of the blog, don't repeat yourself.org, where I have over 5 thousand developers each month to learn more about Python, machine learning and computer vision. Image classification is very common tasks in computer vision. It is a task where you have a set of images and the unit to classify them into a set of categories. In this course, specifically, we will classify images into two categories. Smiling and nodding, smiling. That we will use deep learning and start by training a convolutional neural network on this minus dataset, which contains faces of people smiling and not smiling. Once the network is drying, we will go through the following steps to detect smiles in images and videos. We will use the hard cascades to detect a face, an image. We will then extract the face region from the image. Then we will pass the face region to the network for classification. And finally, we will annotate the image with the label smiling or not smiling, depending on the output of the network. If that sounds exciting, then let's get into the class and start building our smell detection application. 2. Installation: In this first video, we will see how to install the required packages for this project. You can take the requirements.txt file from the source code and installed what's in it. Or if you are following along with me, you can create new file in the project directory. We will name it requirement's dxdy and inserted the libraries needed for this project. So we need num pi. We've done how cyclists learn OpenCV. We also need mud clots. Then finally we need, now we save the file. And from the terminology we learned started from the power requirements dot text. In my case, the packages are already installed. So here we can see requirement already satisfied all these libraries. But in your case, you will see the packages are being installed. And it might take some time to install all of them. 3. Loading the Data: In this video, we are going to load this mice. The mice dataset contains more than 13 thousand grayscale images of 64 by 64 pixels into crisis. Smiling, and they're not smiling. For the non smiling faces, we have more than 9 thousand images. And for the smiling faces, we have 3,690 images. So the doctype set is imbalanced, meaning that there are an even distribution of the images. A common technique for dealing with an imbalanced dataset is to apply class waiting. We will see how to do it when we build our model. So we will first download the dataset. You can navigate to this URL to download it. So we will click on the Record button and we will download it in a zip format. When the file is downloaded, we unzip it. We delete this file. And our data set is contained in this formula you smiles folder. So here we have two sub folders, positives and negatives. This one contains images of faces with smiles, as you can see here. And the negatives folder's contents, images of non smiling. So as you can see, the pictures are in gray scale and are a bit more 64 by 64. So normally we can train our model with duct problem. It should not consume a lot of resources. Don't forget to copy the folder to the project directory. Now we are ready to load our dataset from this. So let's start by importing the necessary packages. We will first create a new Python file, training dot p-y. We will start by importing the train test split function from scikit-learn got wandering selection. Today and this split, we will use it to split our dataset into training set, validation set, and a testing set. Then we have the image array function. So no pulses in my image or image to array. And then we have from TensorFlow keras dot load in range, which we will use to load an image from this. Next we have the sequential model. So from serve Flow Keras, Sequential. And then we need some lawyers. So we will have TensorFlow. Keras, layers comes to the year. Then we have the monks pool, then we have the flatten layer, and finally dense layer. We also need the pyplot module, numpy and the OS package. So let's import them as well. And now we are ready to load our dataset. So let's create a function called image parts, which will take the bot to the dataset as an argument. And it will return the list of paths to the images in the dataset. So we can first create a list of valid extensions. So we will say valid formats is equal to. Gpg will include P and G. So basically we will use this list to check if the file extension is an image. And the next, let's create an empty list called image parts. Is equal to an empty list, which will contain, will contain the parts to the images in the dataset. Next we will look over the root directory tree. Let's say for your pod names in the wall. And here the path is the path to the current directory in the tree. And their names are of the sub-directories of the current directory in the three, and the five names are the files in the current directory. So let's start by printing. Here's my Harris and we will see what I'll say here. You much plants. Here we go right off this light axon. We will run our script. So the output here is all the sub-directories of the smiles directory. As you can see, we have the negatives, positives, which is here, and then negative seven, positives, seven, which are here. And here. Now we will loop over the final iris. So we will write for name in names. Let's also print the pipe and final part to see what's going on. So we can write you're on your own color script. So here as you can see, we have this directory pot and then this fine grain. And then this layer actually bought this filename, which is an image, and so on. So what we can do now is to extract the file extension, which is either This, this, and compare it to our, to this list to check if the filename is an image. So we will write here extension. Want to split text. We provide fine name. Noah. Then we will write our extension is in the valid formats list. We will build the full path to our image. So we will say image part is equal to OS dot, dot, dot join. And here we will join the directory part, which is this one. We define name here part. With the funding for this particular image. The full name to the image will be this one. And we also need to append our image two, our list. We are done with this function. Just written image parts. So now when this function we have the full path to each image in the dataset stored in a list. So the next step is to use this image parts and load the smartest dataset. So we can say, we'll create a new function. Name it load cell. And here we will provide two arguments. The first one is the image parts, and the second one is the target size of the images. We'll define it here. Let's say image size is equal to 3232. Image body image paths. Now we can create two empty list. First one, we will name it the town, which will, which will contain the images. And then we read it. So can the one named labels, which will contain the label for each image smiling or not smiling. So next we will start by getting the images. So we can loop over the image parts. We will say for image part pods. Here we will write image is equal to, we will load our image using the load image function. Here the first argument is the path to our image. And then we have the color mode, which is grayscale. And then the last argument is the target size, which we can use our target size. Next, we need to convert our image to a NumPy array. We will simply write image two. And finally, we append our image to the data. So here we are done with the images. We can now get the labels. Now to get the label, we can extract that from the image. But so for example, here we will print out the image parts. And we will say, See what we get. The data. Provide image box. So here as you can see, we have the bot to each image in our dataset. For example, here is the full path to this particular image. So if we take the full part, we can extract the label from this part, which is here positives. So for this particular image, the person is smiling. We have a smiling face in this image. So in order to get the label, we will find here this line. And we will write label is equal to image bond and form the image, but we will split it based on the directory separator, which is this separator here. And we will take the third one from the end. So this is the first one, minus one, minus two, and minus three. So here we will get label is equal two positives. Now we will say if our label is equal two positives, we will encode it as one. So we will write one. If num is equals to positives, otherwise, we will encode it has 0. Finally, we append our label to our list. The last step is to simply return the data, list and list. We make sure to convert them to a NumPy array. We will say return num pi dot, rewrite the top and the non pi dot labels. We also hear need to scale the data to the 01 range by dividing by 255. Here to here we can get our data tables using our function. 4. Training the Smile Detector: In this part we're going to build our model and started training. The model we're going to build consists of slack off to curve to the plus mark spawning through the blogs, followed by fully connected layer. So as you can see from this image, we will have this thin layer than max pooling to the layer. And then the same thing here, C plus max pooling layers. And then we will have the fully-connected block, which consists of flattened layer, and then a dense layer and another layer which will wait outwards like. So, let's create a function. We will name it the model. And it will take an argument's good shape, which will be equal to the average science. Gloss one for a channel dimension. But we wouldn't need to change. In this topic. We'll use this equation models so we can write mumbling, want to sequential. We will have all the filters will take dirty job. The governor signs. Let's see, three-by-three size. For the activation to do observation function. Or the Crusades. Same good shape. We will use alarm, holiday max-pooling, which we will use concise to my job. And then same thing here of fluid loss layer of the filters. We will double the number of filters. And then the flattened layer. With 256. You are honest. And for the activation of the output layer. We have. Since we are dealing with binary classification problem, we'll use the sigmoid activation function and send you all. So you're one for the activation signal. Let's also compile our model. So we will model dot compile the loss function. Of course we will take the binary cross on copy. The optimal is 0. Let's say optimizer or the metrics. Really take the Parker Awesome. Next we need to calculate the class weights. As I said before, our dataset is imbalanced. So we need to give more weight to the underrepresented glass. Smiling plus, in this case, so that the model will pay more attention to this glass. We can start by contains the number of each label using the function numpy dot unique. So we will say, I forget to return the model. So in return the model, and we will say label counts is the call to non pi. What's running on the head we will be with our labels and we will say return, true. Return columns is equal to show, to return the number of each label. Let's print out these. We will say we will not have problem to products. We have. Goes here, since we don't have to label, all the labels are set to 0. So now we're on our script. Here. As you can see, we have formed a labeled 01, which are smiling or not smiling. And for the count variable, we have 9,475. Unfold this lining layer. We have 3,690. So now we need to compute the class weights. For that. We can use a simpler operation. So we can write here counts. The new accounts will be equal to the max counts, which is here, and we divide it by the previous counts. Let's move these two variables again. So here, as you can see, the plus one is myInt plus will have the weights to 0.56 greater than the plus 0. Let's create a dictionary that will map each class with its weight. So we can write plus weight is equal. We'll use the zip function. So the dictionary. Now we need to create a training set and validation set and test set. We will take 20% of the data to pray this set. And from the remaining 80%, we will take 20 per cent to create the validation set. So let's start with the test set. We will use the train test split function for that. So we, we liked it's trying to provide our day. And then followed with this size will take 20 per cent. Say, start the fall. They follow their own demonstrate. Or to do. Same thing for the validation set. We just copied and pasted. Strain. Slowly drain. We provide. Right. Now we are ready to train our models. So we will first build the model by writing. Model is equal to the module. And we will train the model for 20th box. So the box is equal to one. Art historian is the quantile would say it's the wonder why today. The validation data validation set. And we need to use the plus weight for the batch size. Say 64. Unfolded the box, a box. Let's also save the model for later. So we can simply write one doesn't say and we will name it, will harm a gloss. Waits and waits for the starting file. We use the one today. So welcome. Training should not take too long So you can wait for training to finish. So as you can see what our gyms and off approximately 90 per cent. But let's plot the learning curves to visualize our results. So I will just copy the code from here. There's nothing special. Here. We are just plotting the training accuracy, the validation accuracy. And then we have the training loss and the validation. So let's rerun the script. So here we have the training loss, and here we have the training validation accuracy of approximating here, 1, 2%. What you can notice that the training and validation accuracy approximately here, then start to diverge. So this is a sign of overfitting. We can improve the accuracy of our model by using the data augmentation. But I will not cover it in this class. So let's evaluate our model in the test set. And we can apply our model to summary or what images. So this loss accuracy. There's a quantile plot involving use the test set. Here we recently during the loss operator, Awesome. So we have a grassy, it's approximately one to person. 5. Applying Our Smile Detector to Images: Now that our model is trained, we will apply it to some real-world images. So let's create a new Python file. We will name it smile, detector image, PY. And we can start writing some code. So we will import often say v. Then we have from those models. Load function. Also need. Next, let's define some variables. So we will credit the width is equal to 800. Let's say the height is equal to 600. We also create the blue color, so it is equal to 5500. And now we can load our image. So here I have a folder with some images. I will just copy it from here. I have ten image. We will take the first one, our models. So we will like image is equal to CV too high a marine. Images. We take the first hunch. Also need to resize it. So we will like TV tool, resize image and use the width and height. And finally, we converted to very silly to not CBT convert the hair CV to read you have to gray. Next we can start detecting faces using the Haar cascade classifier. So here I have defined our cascade, frontal face default XML. So I will just put it here, the project directory, and we can load it using the function CV to adopt cascade classifier. So we will write our face detector is equal to CV to cascade classifier. And here we provide the name of we also need to load our model. So we will say more than is equal to the model. Here we have name of our model which is more than. Now we can detect faces using the function detect multiscale. Like these rectangles is equal to our face detector. For the first argument, we provide our grayscale image, and then we have the scale fake car. We will put it as 1.1. Then we have to define the minimum neighbors, which we can put it as. These parameters work. Well in my case, but you can change them if you want. So here the face rectangles variable will contain the face bounding boxes. So the next step is to loop over the face bounding boxes and extract the face region. We will say for x, y, width, and height. In our case. Here we will first rectangle around the face. So we, we like TV2.com. Here. We do know our image. And we will give the MSDN accounts, which are x and y. And then we have two different the diagonal cracks, which would be x plus y plus. And then we have the color blue and two for the thickness. And the next week start the face region from the gray scale image. We have the coordinates of the face here. So we will write our hero why? The call to Ray. And we will simply use slicing. So we will arise from y to y plus x. X plus width. We also need to resize the face region and scale it to 01 branch before feeding it to our model. So we will lie or y is equal to CV to dock the size. Here we will learn to size it to 32 by 32 because this is what we use to train our model. Here for the image size we have 32 by 32. Then we scale it to 01, branch Two, 55. We also need to add the image to a batch. We will, because here the image shape of Thirty-two. Thirty-two. Our model was trained on batches of images. So the new shape will be equal to T2, T2 by T2. So we will write y is equal to y. And we will use the function non-piped dot, new access to this new axis, y axis. And we will keep everything the same here. Can print here the before and after you access to see what has changed here. The shape. The same thing here. Here we have one attribute error or twice. In the US is capital letter. Here. As you can see, how I before adding the new axis is 32 by 32. And after adding the u axis here, Econ 132 by 32. So now the final step is to pass the region of the face TO do not work. For classification. We never remove. We will write what prediction is equal to predict arrow and let's print dictionary. So normally do prediction will have a value between 01. Here. You can see here is the value of the prediction for this particular image. The prediction is 0.6. So what we can do right now is that our label is equal to, say, if the prediction is greater than or equal to 0.5, we will create the label to, we'll set the label to smiling. Otherwise we will set it to not smile. So we will label is equal to smiling. If the prediction is greater than or equal to 0.5, otherwise, not smiling. That's put into the night. To check for this case, the label should be spending your proceed with smiling for the day. But the last thing we need to do is to write the text and the image and display the image. Like CV. To put the x. Here, we will put text on that. And we would like to do the labeling and the image. For the coordinates text, we will use the coordinates of the face. So we will say x plus x and y. And for the font, can choose this is, let's say this one. And for the font scale, let's say 0.75, blue color. Ways to view and CB2. I am sure we keep our column. The person is not smiling, but our model has detected on mining. Of course, our model is not 100% accurate, so this is a positive prediction. Let's try with another image. Let's say this one. This time. I'm Armando has successfully predicted the slide. We can try it the way the person is, not smiling. So we wouldn't take this one who can test with the other images as well. And that's with the image. The person is not smiling. And our model has successfully detected. 6. Applying Our Smile Detector to Videos: Now that you know how to detect smiles images, let's say how to apply our deep learning smile detector two videos. So let's create new Python file. Name it smile detector, PY. And let's import the libraries so the code will not change very much from this. I will just copy this fine to paste it here. Then we have the width and the height. When the blue color. Here, we just need to initialize the video capture objects. So we would renew your cup. Sure is equal to CV to video capture. Here we provide the name for our video. Here I have a pre-recorded video of me in which I did some tests. So I will just use it for this part of the video here. I did some tests to and smiling and lots of money. So here we will put the name of the video mp4. And then we have the same thing. From this part. We will need to face detector, our pre-trained wonder, and we will detect faces, losing the detect multiscale function. So we will just copy these three lines here. So here's our dictionary. We read from the browser the frames, so we can start processing our frames. We may say, why is true. Then here we're going to get the next frame from the video. So when we say frame according to the video capture. And then we will convert it to various curve, CV to CV, CV to BGR. And here we can use The digs multiscale function and we pass it along gray scale frame. Now we can loop over the face bounding boxes. So I will just copy the code from here because everything remains the same here. Just need to change here image, frame. Find a new way. You write the text the same, and we display our frame. So just copy the code from here. How CV to look here in the header, we wouldn't say your CV to look, Wait, wait for one minute. So current is equal to q. Okay? We break out. Finally, we released the video capture and the windows. Let's see the final result. We will say Python, three, smile detector reading. So here you can see the algorithm has no problem detecting smiling and smiling from the video. The detection is quite stable. 7. Conclusion: In this class you'll learn how to train a convolutional neural network to detect smiling in images and video. The first thing we have to do is to load our dataset from disk and prepare it for our network. Next, we created our neural network. And here we calculated the class weight to account for class imbalance in the dataset. And thus force the model to pay more attention to the underrepresented class. We then here, this part split a dataset into a training set, validation set, and a testing set, and then proceed to train our network. We saw we achieved an accuracy of 90 per cent on the test set. The final step was to test our model in images and videos. So for that, we use a Haar cascade classifier here in this part to detect the face in the image. And then we extracted in this part here, the face region from the image and bonds the face region for prediction to our model. And finally, we labeled the image as smiling or not smiling.