Object Detection with Deep Learning and OpenCV | Yacine Rouizi | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Object Detection with Deep Learning and OpenCV

teacher avatar Yacine Rouizi

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Overview

      1:22

    • 2.

      Installation

      1:40

    • 3.

      Object Detection in Images

      14:10

    • 4.

      Object Detection in Videos

      4:41

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

63

Students

--

Project

About This Class

In this class, we will see how to detect objects in images and videos using deep learning and OpenCV. 

We will be using the Single Shot Detector framework combined with the MobileNet architecture as our deep learning-based object detector.

Meet Your Teacher

Hi! My name is Yacine Rouizi. I have a Master's level in physics of materials and components and I am a passionate self-taught programmer. I've been programming since 2019 and I teach on my blog about programming, machine learning, and computer vision.

My goal is to make learning accessible to everyone and simplify complex topics, such as computer vision and deep learning, by following a hands-on approach.

See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Overview: Hi. In this class, we will see how to detect objects in images and videos using deep learning and open CV. My name is Yossi and I will be your instructor in this class. I've been programming since 2019. And I am the author of the vlog. Don't repeat yourself.org, where I help over 5 thousand developers each month to learn more about Python, machine learning, and computer vision. So what is object detection? Object detection is the process of locating objects with bounding boxes, an image, or a video. It is one of the most important tasks in computer vision. And it has many applications in various fields, such as surveillance, people, counting, self-driving cars, etc. Now, there is a difference between object detection and image classification. So basically object detection is the process that locates objects in an image. While image classification is the process that assigns labels to images based on their content. So let's get into the class and start building our project. 2. Installation: The first thing we need to do is to install the required packages useful for image processing. So we are going to install OpenCV and the non pipe. Let's first start with OpenCV, open a new terminal window and run the command pip, Install OpenCV polygon. In my case, you can see that I have open CV already installed. Requirement, already satisfied. But I just want to create a virtual environment. So the trio can see what you get when you installed it. So let's create the virtual environment. We envy. Let's talk TV. Now, let's pre-installed open CV. Cv. Here you can see that OpenCV was successfully installed along with non pipe open CV. So I don't need to install NumPy. Here you can see that I have open CV 4.54.5 version of OpenCV and the version of non pi air is 1.22 to one. 3. Object Detection in Images: Now in this video, we will be using the single shot detector from work combined with the MobileNet architecture as our deep learning based object detector. So the first thing do we need to do is to import our libraries. So we will say import CB2. And we can also load our image. So we will say Marine here, the pot for our image. And we can also resize it as well. So when we say CV to resize image, let's say 644 the width. And for 84 the height. Now let's get the height from the image and the width. So we wouldn't say image dot shape 0. And for the height will say shape one. Now we need the weights and the configuration file for our model. So you downloaded this phi is from the OpenCV documentation. So we have two files here. I will put a link to these folks in the text version of this part. So now that we have all the files that we need, we can load our model. So we can write here our network. So we will not see V2 dot DNN dot read on sort of flow. And in the header we will provide the weights and the configuration file. So we cannot write these two variables here. When you say the weight is equal to and we provide the path to our weight. So this is the file that contains the weights for the model. You can say model. This one is the architecture of our model. Now what we can do is that, so here we have the cocoa names file. When you put it here inside the project. So basically this file contains the clauses that we can detect. So we can open the file and install the class labels in the list. We can use Context Manager for adults, so we will say with open. And here we provide the path to our cocoa name is fine. Just copy it from here. Here we will say read file. Here we will store the class labels inside, inside the list that we are going to name names. So we will say here cluster names is equal to an empty list. Here we can write f, lot, rained, stripped and dots. And we will split based on the new line. Now we need to preprocess our image and we have a function that will do all the preprocessing for us. So we can say here, blob is equal to c v2 dot DNN. And here we will use the functional envelope image. Here we provide our image. And now here we have a few parameters that are set by default. These are provided from the documentation. So basically here the first one is the scale factor. We can put 1, we divide by 127.5. And then here we have the size for the output image. We cannot write 320. And the last argument here is the means of production values. We can now write one-to-one to 7.5. Same thing here and same thing here. Next, we can set the, this blob as input for the network and get the output prediction. So we can say here, not a lot. Input. And we will provide our blob. To make the prediction. We will say output is equal to four. Now here we have our predictions. So let's print out the shape of this variable. We will say output shape will be to write our code. So here, as you can see, we have a shape of 11107. So here we have the detections, the hair seven, we have the bounding boxes, the confidence, and some other information. So now what we can do, we can loop over this variable to get the detections. So we will say for detection in output. Here we will say 00. And here we take everything. And here also we take everything. Now here we will get the confidence of the model for the current detection. So we can say Pro Really t is equal to the second argument from the second element from our detection. Now we can filter the bug detections. So we can now bind if our probability or the confidence of the model is, let's say, below 0.5, we will continue looping. So we will do nothing. And if not, we will get the bounding box from the detection. So the bounding box are located. You can see detection from three to seven. Now, this bounding box, or given a relatively with regards to the width and the height of the image. So let's print out our box. And the hair. As you can see, we have 0.30.350.5. So we need to multiply them with the width and the height of the image to get the actual x and y coordinates of the bounding box. Hello, What we can do, we can use the zip function and we will not write a list comprehension. So here when we write the ZIP, we will take the detections from three to seven. And the second argument here we will provide the width, the height, the width, and the height. And here we will say For a, b in this zip function, we will take the multiplication of these two elements. Here. The first coordinate is the x of the top-left of the bounding box. We multiply it with the width. Then we have the y. We multiply it with the height. And then we have the x at the bottom, right times the width. And then the y coordinates at the bottom, right times the height. So we don't need this anymore. And we also need to convert our list into a tuple. Now we can draw the rectangles, so we will say c v2 dot rectangle. Here. Let me say image. For the coordinates of the rectangle, we will use the bounding box. Take for the starting point, we will take the first two elements from the bounding box. Then here we will say the last two elements. The reason for the call on to for the thickness. Now let's extract the cluster ID of the detected object and get the class label. So we can alright, class ID is equal to and here the cluster ID. We can access it like this. For the class name, we can write here the label. The label which is the text dot we are going to put in the image we are going to write is equal to an f string. The first element. We will take the class label. So we have our class names. And our class names is a list. It starts from 0. So we need to subtract one from the cluster ID. So we will write allows ID and we'll subtract one. Now we can also hear displayed the probability along with the class labels. So we will say probability times 100. And finally, we can draw our text on the image. So we will say CV to adopt put text here, image. Here for the text, we will have say the label for the origin. We can use our boxes. So 0 here and the box one. But here we will add some pixels to make sure that the text is inside the bounding box. Let's add 15 pixels. Now for the font, we can use font. Simply looks for the skin, let's say 0.5 green color. And two, for the thickness. I think that we are done so we can display our image. So say V2 dot IM show image and image. And CB2 dot weight 0. Let's run our code. And here the bounding box is not displayed. Here is, we need to take the last two elements in the bounding box. So when we run this again, now here as you can see, the person was detected, but here the text is a little weird. We can change. The font, will say, women say here. On tertiary is simply looks. So here as you can see, the person was detected correctly. And here we have the confidence, which is 78% 4. Object Detection in Videos: Now let's move on to the next step, which is to detect objects in a video. So here first we need to initialize the video capture. We will write v d u is equal to CV to adopt video capture. And here I have a video that I'm going to use in this example. So we will write here, we're say redo one dot mp4. Now here everything remains the same. But two, it can create a list of random colors to represent each class. So we can now write here. First we will need to import numpy. And here we don't actually need these two packages. So here we will write non pi dot random seed for T2 and T4. The colors, we will say non pi, not around them. Around. For the low value, we'll say 0 and for the high-value 255. So it will generate values between these two values. The size we will use the length of our class names. And here we will say three to generate the RGB colors. Next we can start processing our forams. So while true. And here we will do an up frames. So we will say why is true here. When the cup and when we put this inside the while loop. Here, we went to get the height of our frames. So for him, not shaped 0. And also the width and the hair, everything remains the same. So we will hear say frame. The only thing we want to do is to get the color forum, the current detection. So we will say colors is equal to call ours. And here we need to provide the ID of the detection. So we can put this line before the car lot. And here we can say class. Now we need to convert the color into an integer. So we will say r is equal to integer of color, the first color. The same thing for the BGF. And the header, we can use our custom color. We will say B, G, and R. And here the name and frame. And here we need to use one for one millisecond for the weight k. Otherwise, our frame will be frozen. So we can now run our code. So there we go. Our car is detected successfully on the video. Now you can use different videos to see if this will, this will work good. But I hope that you get the idea of object detection.