Transcripts
1. Introduction: Welcome to the class might
detection with deep learning, where we will be creating an end-to-end
application that can detect mines in
images and videos. My name is the arsine and I will be your instructor
in this class. I've been programming
since 2019. And I am the author of the blog, don't
repeat yourself.org, where I have over 5
thousand developers each month to learn
more about Python, machine learning and
computer vision. Image classification is very common tasks
in computer vision. It is a task where you
have a set of images and the unit to classify them
into a set of categories. In this course, specifically, we will classify images
into two categories. Smiling and nodding, smiling. That we will use deep learning
and start by training a convolutional neural network
on this minus dataset, which contains faces of people
smiling and not smiling. Once the network is drying, we will go through the
following steps to detect smiles in
images and videos. We will use the hard cascades
to detect a face, an image. We will then extract the
face region from the image. Then we will pass
the face region to the network for classification. And finally, we will
annotate the image with the label smiling
or not smiling, depending on the
output of the network. If that sounds exciting, then let's get into
the class and start building our smell
detection application.
2. Installation: In this first video, we will see how to install the required packages
for this project. You can take the
requirements.txt file from the source code
and installed what's in it. Or if you are following
along with me, you can create new file
in the project directory. We will name it requirement's dxdy and inserted the libraries
needed for this project. So we need num pi. We've done how
cyclists learn OpenCV. We also need mud clots. Then finally we need, now we save the file. And from the terminology we learned started from the power
requirements dot text. In my case, the packages
are already installed. So here we can see
requirement already satisfied all these libraries. But in your case, you will see the packages
are being installed. And it might take some time
to install all of them.
3. Loading the Data: In this video, we are
going to load this mice. The mice dataset contains more than 13 thousand
grayscale images of 64 by 64 pixels into crisis. Smiling, and they're
not smiling. For the non smiling faces, we have more than
9 thousand images. And for the smiling faces, we have 3,690 images. So the doctype set
is imbalanced, meaning that there are an even distribution
of the images. A common technique
for dealing with an imbalanced dataset is
to apply class waiting. We will see how to do it
when we build our model. So we will first
download the dataset. You can navigate to this
URL to download it. So we will click on the Record button and we will download it
in a zip format. When the file is
downloaded, we unzip it. We delete this file. And our data set is contained in this formula
you smiles folder. So here we have two sub folders,
positives and negatives. This one contains images
of faces with smiles, as you can see here. And the negatives
folder's contents, images of non smiling. So as you can see, the pictures are in gray scale and are a
bit more 64 by 64. So normally we can train our
model with duct problem. It should not consume
a lot of resources. Don't forget to copy the folder
to the project directory. Now we are ready to load
our dataset from this. So let's start by importing
the necessary packages. We will first create a new
Python file, training dot p-y. We will start by
importing the train test split function from scikit-learn
got wandering selection. Today and this split, we will use it to split our
dataset into training set, validation set,
and a testing set. Then we have the
image array function. So no pulses in my image or image to array. And then we have from TensorFlow keras dot load in range, which we will use to
load an image from this. Next we have the
sequential model. So from serve Flow Keras, Sequential. And then we need some lawyers. So we will have TensorFlow. Keras, layers comes to the year. Then we have the monks pool, then we have the flatten layer, and finally dense layer. We also need the pyplot module, numpy and the OS package. So let's import them as well. And now we are ready
to load our dataset. So let's create a function
called image parts, which will take the bot to
the dataset as an argument. And it will return the list of paths to the images
in the dataset. So we can first create a
list of valid extensions. So we will say valid
formats is equal to. Gpg will include P and G. So basically we will use this list to check if the
file extension is an image. And the next, let's create an empty list
called image parts. Is equal to an empty
list, which will contain, will contain the parts to
the images in the dataset. Next we will look over
the root directory tree. Let's say for your pod names in the wall. And here the path is the path to the current
directory in the tree. And their names are of the sub-directories of the current directory
in the three, and the five names are the files in the
current directory. So let's start by printing. Here's my Harris and we will
see what I'll say here. You much plants. Here we go right off
this light axon. We will run our script. So the output here is all the sub-directories
of the smiles directory. As you can see, we have
the negatives, positives, which is here, and
then negative seven, positives, seven, which
are here. And here. Now we will loop
over the final iris. So we will write
for name in names. Let's also print the pipe and final part to
see what's going on. So we can write you're on your own color script. So here as you can see, we have this directory pot
and then this fine grain. And then this layer actually
bought this filename, which is an image, and so on. So what we can do now is to
extract the file extension, which is either This, this, and compare it to our, to this list to check if
the filename is an image. So we will write here extension. Want to split text. We provide fine name. Noah. Then we will write our extension is in the
valid formats list. We will build the full
path to our image. So we will say
image part is equal to OS dot, dot, dot join. And here we will join
the directory part, which is this one. We define name here part. With the funding for
this particular image. The full name to the
image will be this one. And we also need to append
our image two, our list. We are done with this function. Just written image parts. So now when this
function we have the full path to each image in the dataset
stored in a list. So the next step is to use this image parts and load
the smartest dataset. So we can say, we'll create a new function. Name it load cell. And here we will
provide two arguments. The first one is
the image parts, and the second one is the
target size of the images. We'll define it here. Let's say image size
is equal to 3232. Image body image paths. Now we can create
two empty list. First one, we will
name it the town, which will, which will
contain the images. And then we read it. So can the one named labels, which will contain the label for each image smiling
or not smiling. So next we will start
by getting the images. So we can loop over
the image parts. We will say for image part pods. Here we will write
image is equal to, we will load our image using
the load image function. Here the first argument
is the path to our image. And then we have the color
mode, which is grayscale. And then the last argument
is the target size, which we can use
our target size. Next, we need to convert
our image to a NumPy array. We will simply write image two. And finally, we append
our image to the data. So here we are done
with the images. We can now get the labels. Now to get the label, we can extract that
from the image. But so for example, here we will print
out the image parts. And we will say, See what we get. The data. Provide image box. So here as you can see, we have the bot to each
image in our dataset. For example, here is the full path to this
particular image. So if we take the full part, we can extract the
label from this part, which is here positives. So for this particular image, the person is smiling. We have a smiling
face in this image. So in order to get the label, we will find here this line. And we will write label is equal to image bond and
form the image, but we will split it based
on the directory separator, which is this separator here. And we will take the
third one from the end. So this is the first one, minus one, minus two,
and minus three. So here we will get label
is equal two positives. Now we will say if our label
is equal two positives, we will encode it as one. So we will write one. If num is equals to positives, otherwise, we will
encode it has 0. Finally, we append our
label to our list. The last step is to simply return the data, list and list. We make sure to convert
them to a NumPy array. We will say return num pi dot, rewrite the top and
the non pi dot labels. We also hear need
to scale the data to the 01 range by
dividing by 255. Here to here we can get our data tables
using our function.
4. Training the Smile Detector: In this part we're
going to build our model and started training. The model we're going to
build consists of slack off to curve to the plus mark
spawning through the blogs, followed by fully
connected layer. So as you can see
from this image, we will have this thin layer than max pooling to the layer. And then the same thing here, C plus max pooling layers. And then we will have the
fully-connected block, which consists of
flattened layer, and then a dense layer and another layer which will
wait outwards like. So, let's create a function. We will name it the model. And it will take an
argument's good shape, which will be equal to
the average science. Gloss one for a
channel dimension. But we wouldn't need to change. In this topic. We'll use this
equation models so we can write mumbling,
want to sequential. We will have all the filters will take dirty job. The governor signs. Let's see, three-by-three size. For the activation to do
observation function. Or the Crusades. Same good shape. We will use alarm, holiday max-pooling, which we
will use concise to my job. And then same thing here of fluid loss layer of the filters. We will double the
number of filters. And then the flattened layer. With 256. You are honest. And for the activation of the output layer. We have. Since we are dealing with
binary classification problem, we'll use the sigmoid
activation function and send you all. So you're one for the
activation signal. Let's also compile our model. So we will model dot
compile the loss function. Of course we will take
the binary cross on copy. The optimal is 0. Let's say
optimizer or the metrics. Really take the Parker Awesome. Next we need to calculate
the class weights. As I said before, our dataset is imbalanced. So we need to give more weight to the underrepresented glass. Smiling plus, in this case, so that the model will pay
more attention to this glass. We can start by contains
the number of each label using the function
numpy dot unique. So we will say, I forget to return the model. So in return the model, and we will say label counts
is the call to non pi. What's running on the
head we will be with our labels and we will
say return, true. Return columns is equal to show, to return the number
of each label. Let's print out these. We will say we will not have problem to products. We have. Goes here, since we
don't have to label, all the labels are set to 0. So now we're on our script. Here. As you can see, we have formed a labeled 01, which are smiling
or not smiling. And for the count
variable, we have 9,475. Unfold this lining layer. We have 3,690. So now we need to compute
the class weights. For that. We can use
a simpler operation. So we can write here counts. The new accounts will be
equal to the max counts, which is here, and we divide
it by the previous counts. Let's move these two
variables again. So here, as you can see, the plus one is
myInt plus will have the weights to 0.56
greater than the plus 0. Let's create a dictionary that will map each
class with its weight. So we can write plus
weight is equal. We'll use the zip function. So the dictionary. Now we need to create a training set and
validation set and test set. We will take 20% of the
data to pray this set. And from the remaining 80%, we will take 20 per cent to
create the validation set. So let's start
with the test set. We will use the train test
split function for that. So we, we liked it's trying to provide our day. And then followed with this
size will take 20 per cent. Say, start the fall. They follow their
own demonstrate. Or to do. Same thing
for the validation set. We just copied and pasted. Strain. Slowly drain. We provide. Right. Now we are ready to train our models. So we will first build
the model by writing. Model is equal to the module. And we will train the
model for 20th box. So the box is equal to one. Art historian is the
quantile would say it's the wonder why today. The validation data
validation set. And we need to use the plus weight for the batch size. Say 64. Unfolded the box, a box. Let's also save the
model for later. So we can simply write one doesn't say and we will name it, will harm a gloss. Waits and waits for
the starting file. We use the one today. So welcome. Training should not
take too long So you can wait for
training to finish. So as you can see
what our gyms and off approximately 90 per cent. But let's plot the
learning curves to visualize our results. So I will just copy
the code from here. There's nothing special. Here. We are just plotting the training accuracy,
the validation accuracy. And then we have the training
loss and the validation. So let's rerun the script. So here we have
the training loss, and here we have the training validation accuracy of approximating here, 1, 2%. What you can notice that the training and validation
accuracy approximately here, then start to diverge. So this is a sign
of overfitting. We can improve the accuracy of our model by using the
data augmentation. But I will not cover
it in this class. So let's evaluate our
model in the test set. And we can apply our model
to summary or what images. So this loss accuracy. There's a quantile plot involving use the test set. Here we recently during the loss operator, Awesome. So we have a grassy, it's approximately
one to person.
5. Applying Our Smile Detector to Images: Now that our model is trained, we will apply it to
some real-world images. So let's create a
new Python file. We will name it smile,
detector image, PY. And we can start
writing some code. So we will import
often say v. Then we have from those models. Load function. Also need. Next, let's
define some variables. So we will credit the
width is equal to 800. Let's say the height
is equal to 600. We also create the blue color, so it is equal to 5500. And now we can load our image. So here I have a folder
with some images. I will just copy it from here. I have ten image. We will take the first
one, our models. So we will like image is equal
to CV too high a marine. Images. We take the first hunch. Also need to resize it. So we will like TV tool, resize image and use
the width and height. And finally, we converted
to very silly to not CBT convert the hair CV to
read you have to gray. Next we can start
detecting faces using the Haar
cascade classifier. So here I have
defined our cascade, frontal face default XML. So I will just put it here, the project directory,
and we can load it using the function CV to
adopt cascade classifier. So we will write our
face detector is equal to CV to
cascade classifier. And here we provide the name of we also need to load our model. So we will say more than
is equal to the model. Here we have name of our
model which is more than. Now we can detect faces using the function detect multiscale. Like these rectangles is
equal to our face detector. For the first argument, we provide our grayscale image, and then we have
the scale fake car. We will put it as 1.1. Then we have to define
the minimum neighbors, which we can put it as. These parameters work. Well in my case, but you can change
them if you want. So here the face rectangles variable will contain
the face bounding boxes. So the next step is to loop over the face bounding boxes and
extract the face region. We will say for x, y, width, and height. In our case. Here we will first
rectangle around the face. So we, we like TV2.com. Here. We do know our image. And we will give
the MSDN accounts, which are x and y. And then we have two different
the diagonal cracks, which would be x plus y plus. And then we have the color blue and two for the thickness. And the next week start the face region from
the gray scale image. We have the coordinates
of the face here. So we will write our hero why? The call to Ray. And we will simply use slicing. So we will arise
from y to y plus x. X plus width. We also need to resize the
face region and scale it to 01 branch before
feeding it to our model. So we will lie or y is equal
to CV to dock the size. Here we will learn
to size it to 32 by 32 because this is what we
use to train our model. Here for the image
size we have 32 by 32. Then we scale it to
01, branch Two, 55. We also need to add
the image to a batch. We will, because here the image shape of
Thirty-two. Thirty-two. Our model was trained
on batches of images. So the new shape will be
equal to T2, T2 by T2. So we will write
y is equal to y. And we will use the
function non-piped dot, new access to this
new axis, y axis. And we will keep
everything the same here. Can print here the before and after you access to see
what has changed here. The shape. The same thing here. Here we have one
attribute error or twice. In the US is capital letter. Here. As you can see, how I before adding the
new axis is 32 by 32. And after adding
the u axis here, Econ 132 by 32. So now the final step is to pass the region of the
face TO do not work. For classification. We never remove. We will write what prediction is equal to predict arrow and let's
print dictionary. So normally do prediction
will have a value between 01. Here. You can see here is the value of the prediction
for this particular image. The prediction is 0.6. So what we can do right now is that our label is equal to, say, if the prediction is greater
than or equal to 0.5, we will create the label to, we'll set the label to smiling. Otherwise we will
set it to not smile. So we will label is
equal to smiling. If the prediction is
greater than or equal to 0.5, otherwise, not smiling. That's put into the night. To check for this case, the label should be spending your proceed with smiling for the day. But the last thing
we need to do is to write the text and the
image and display the image. Like CV. To put the x. Here, we will put text on that. And we would like to do the
labeling and the image. For the coordinates text, we will use the
coordinates of the face. So we will say x plus x and y. And for the font, can choose this is, let's say this one. And for the font scale, let's say 0.75, blue color. Ways to view and CB2. I am sure we keep our column. The person is not smiling, but our model has
detected on mining. Of course, our model
is not 100% accurate, so this is a
positive prediction. Let's try with another image. Let's say this one. This time. I'm Armando has successfully
predicted the slide. We can try it the way the
person is, not smiling. So we wouldn't take this one who can test with
the other images as well. And that's with the image. The person is not smiling. And our model has
successfully detected.
6. Applying Our Smile Detector to Videos: Now that you know how to
detect smiles images, let's say how to apply our deep learning smile
detector two videos. So let's create new Python file. Name it smile detector, PY. And let's import
the libraries so the code will not change
very much from this. I will just copy this
fine to paste it here. Then we have the
width and the height. When the blue color. Here, we just need to initialize the video capture objects. So we would renew your cup. Sure is equal to CV
to video capture. Here we provide the
name for our video. Here I have a
pre-recorded video of me in which I did some tests. So I will just use it for
this part of the video here. I did some tests to and
smiling and lots of money. So here we will put the
name of the video mp4. And then we have the same thing. From this part. We will need to face detector,
our pre-trained wonder, and we will detect faces, losing the detect
multiscale function. So we will just copy
these three lines here. So here's our dictionary. We read from the
browser the frames, so we can start
processing our frames. We may say, why is true. Then here we're going to get the next frame from the video. So when we say frame according
to the video capture. And then we will convert
it to various curve, CV to CV, CV to BGR. And here we can use The digs multiscale function and we pass it along
gray scale frame. Now we can loop over the
face bounding boxes. So I will just
copy the code from here because everything
remains the same here. Just need to change here image, frame. Find a new way. You write the text the same, and we display our frame. So just copy the code from here. How CV to look here in the header, we wouldn't say your CV to look, Wait, wait for one minute. So current is equal to q. Okay? We break out. Finally, we released the video
capture and the windows. Let's see the final result. We will say Python, three, smile detector reading. So here you can see the
algorithm has no problem detecting smiling and
smiling from the video. The detection is quite stable.
7. Conclusion: In this class you'll
learn how to train a convolutional
neural network to detect smiling in
images and video. The first thing we
have to do is to load our dataset from disk and
prepare it for our network. Next, we created
our neural network. And here we calculated the class weight to account for class imbalance
in the dataset. And thus force the model to pay more attention to the
underrepresented class. We then here, this part split a dataset into
a training set, validation set,
and a testing set, and then proceed to
train our network. We saw we achieved an accuracy of 90 per cent on the test set. The final step was to test our model in images and videos. So for that, we use a Haar cascade classifier here in this part to detect
the face in the image. And then we extracted
in this part here, the face region from
the image and bonds the face region for
prediction to our model. And finally, we labeled the image as smiling
or not smiling.