Transcripts
1. Overview: Hi. In this class, we will see
how to detect objects in images and videos using
deep learning and open CV. My name is Yossi and I will be your instructor
in this class. I've been programming
since 2019. And I am the author of the vlog. Don't repeat yourself.org,
where I help over 5 thousand developers each month to learn
more about Python, machine learning,
and computer vision. So what is object detection? Object detection
is the process of locating objects
with bounding boxes, an image, or a video. It is one of the most important
tasks in computer vision. And it has many applications
in various fields, such as surveillance, people, counting, self-driving
cars, etc. Now, there is a
difference between object detection and
image classification. So basically object detection is the process that locates
objects in an image. While image classification
is the process that assigns labels to images
based on their content. So let's get into the class and start building our project.
2. Installation: The first thing we need
to do is to install the required packages useful
for image processing. So we are going to install
OpenCV and the non pipe. Let's first start with OpenCV, open a new terminal
window and run the command pip,
Install OpenCV polygon. In my case, you can see that I have open CV already installed. Requirement, already satisfied. But I just want to create
a virtual environment. So the trio can see what you
get when you installed it. So let's create the
virtual environment. We envy. Let's talk TV. Now, let's
pre-installed open CV. Cv. Here you can see that OpenCV was
successfully installed along with non pipe open CV. So I don't need
to install NumPy. Here you can see that I
have open CV 4.54.5 version of OpenCV and the version of
non pi air is 1.22 to one.
3. Object Detection in Images: Now in this video, we will be using the
single shot detector from work combined with the MobileNet architecture as our deep learning
based object detector. So the first thing
do we need to do is to import our libraries. So we will say import CB2. And we can also load our image. So we will say Marine here, the pot for our image. And we can also
resize it as well. So when we say CV
to resize image, let's say 644 the width. And for 84 the height. Now let's get the height from
the image and the width. So we wouldn't say
image dot shape 0. And for the height
will say shape one. Now we need the weights and the configuration
file for our model. So you downloaded this phi is from the OpenCV documentation. So we have two files here. I will put a link
to these folks in the text version of this part. So now that we have all
the files that we need, we can load our model. So we can write
here our network. So we will not see V2 dot DNN dot read on sort of flow. And in the header
we will provide the weights and the
configuration file. So we cannot write these
two variables here. When you say the
weight is equal to and we provide the
path to our weight. So this is the
file that contains the weights for the model. You can say model. This one is the
architecture of our model. Now what we can do is that, so here we have the
cocoa names file. When you put it here
inside the project. So basically this file contains the clauses that we can detect. So we can open the
file and install the class labels in the list. We can use Context
Manager for adults, so we will say with open. And here we provide the path
to our cocoa name is fine. Just copy it from here. Here we will say read file. Here we will store the
class labels inside, inside the list that we
are going to name names. So we will say here cluster names is equal
to an empty list. Here we can write f, lot, rained, stripped and dots. And we will split
based on the new line. Now we need to preprocess
our image and we have a function that will do all
the preprocessing for us. So we can say here, blob is equal to c v2 dot DNN. And here we will use the
functional envelope image. Here we provide our image. And now here we have a few parameters that
are set by default. These are provided from
the documentation. So basically here the first
one is the scale factor. We can put 1, we divide by 127.5. And then here we have the
size for the output image. We cannot write 320. And the last argument here is the means of production values. We can now write
one-to-one to 7.5. Same thing here and
same thing here. Next, we can set the, this blob as input for the network and get
the output prediction. So we can say here, not a lot. Input. And we will provide our blob. To make the prediction. We will say output
is equal to four. Now here we have
our predictions. So let's print out the
shape of this variable. We will say output shape
will be to write our code. So here, as you can see, we have a shape of 11107. So here we have the
detections, the hair seven, we have the bounding boxes, the confidence, and
some other information. So now what we can do, we can loop over this variable
to get the detections. So we will say for
detection in output. Here we will say 00. And here we take everything. And here also we
take everything. Now here we will get
the confidence of the model for the
current detection. So we can say Pro
Really t is equal to the second argument from the second element
from our detection. Now we can filter
the bug detections. So we can now bind if our probability or the
confidence of the model is, let's say, below 0.5, we will continue looping. So we will do nothing. And if not, we will get the bounding box
from the detection. So the bounding box are located. You can see detection
from three to seven. Now, this bounding box, or given a relatively
with regards to the width and the
height of the image. So let's print out our
box. And the hair. As you can see, we
have 0.30.350.5. So we need to multiply them with the width
and the height of the image to get the actual x and y coordinates of
the bounding box. Hello, What we can do, we can use the zip function and we will not write
a list comprehension. So here when we write the ZIP, we will take the detections
from three to seven. And the second argument here
we will provide the width, the height, the width,
and the height. And here we will say For a, b in this zip function, we will take the multiplication
of these two elements. Here. The first coordinate is the x of the top-left of
the bounding box. We multiply it with the width. Then we have the y. We
multiply it with the height. And then we have the
x at the bottom, right times the width. And then the y coordinates
at the bottom, right times the height. So we don't need this anymore. And we also need to convert
our list into a tuple. Now we can draw the rectangles, so we will say c
v2 dot rectangle. Here. Let me say image. For the coordinates
of the rectangle, we will use the bounding box. Take for the starting point, we will take the
first two elements from the bounding box. Then here we will say
the last two elements. The reason for the call
on to for the thickness. Now let's extract
the cluster ID of the detected object and
get the class label. So we can alright,
class ID is equal to and here the cluster ID. We can access it like this. For the class name, we can write here the label. The label which is the text
dot we are going to put in the image we are going to
write is equal to an f string. The first element. We will take the class label. So we have our class names. And our class names is a list. It starts from 0. So we need to subtract
one from the cluster ID. So we will write allows ID
and we'll subtract one. Now we can also hear displayed the probability along
with the class labels. So we will say
probability times 100. And finally, we can draw
our text on the image. So we will say CV to adopt
put text here, image. Here for the text, we will have say the
label for the origin. We can use our boxes. So 0 here and the box one. But here we will add some pixels to make sure that the text
is inside the bounding box. Let's add 15 pixels. Now for the font, we can use font. Simply looks for the skin, let's say 0.5 green color. And two, for the thickness. I think that we are done so
we can display our image. So say V2 dot IM show
image and image. And CB2 dot weight 0. Let's run our code. And here the bounding
box is not displayed. Here is, we need to take the last two elements
in the bounding box. So when we run this again, now here as you can see, the person was detected, but here the text is a
little weird. We can change. The font, will say,
women say here. On tertiary is simply looks. So here as you can see, the person was
detected correctly. And here we have the
confidence, which is 78%
4. Object Detection in Videos: Now let's move on
to the next step, which is to detect
objects in a video. So here first we need to
initialize the video capture. We will write v d u is equal to CV to adopt video capture. And here I have a video
that I'm going to use in this example. So we will write here, we're say redo one dot mp4. Now here everything
remains the same. But two, it can create a list of random colors to
represent each class. So we can now write here. First we will need
to import numpy. And here we don't actually
need these two packages. So here we will write non pi dot random seed for T2 and T4. The colors, we will say
non pi, not around them. Around. For the low value, we'll say 0 and for
the high-value 255. So it will generate values
between these two values. The size we will use the
length of our class names. And here we will say three
to generate the RGB colors. Next we can start
processing our forams. So while true. And here we will
do an up frames. So we will say why is true here. When the cup and when we put
this inside the while loop. Here, we went to get the
height of our frames. So for him, not shaped 0. And also the width and the hair, everything remains the same. So we will hear say frame. The only thing we
want to do is to get the color forum,
the current detection. So we will say colors
is equal to call ours. And here we need to provide
the ID of the detection. So we can put this line
before the car lot. And here we can say class. Now we need to convert the
color into an integer. So we will say r is equal to integer of color,
the first color. The same thing for the BGF. And the header, we can
use our custom color. We will say B, G, and R. And here the
name and frame. And here we need to use one for one millisecond for the
weight k. Otherwise, our frame will be frozen. So we can now run our code. So there we go. Our car is detected
successfully on the video. Now you can use different
videos to see if this will, this will work good. But I hope that you get the
idea of object detection.