Siam Mask Object Tracking and Segmentation in OpenCV Python | Augmented Startups | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Siam Mask Object Tracking and Segmentation in OpenCV Python

teacher avatar Augmented Startups, Computer Vision, AI and Robotics

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

59 Lessons (55m)
    • 1. Siam Mask Promo UD

    • 2. 1 1 Object Tracking Intro

    • 3. 1 2 Single and Multi Object Video Object Tracking

    • 4. 1 3 Object Segmentation

    • 5. 1 4 Siam Mask Object Segmentation Tracking

    • 6. 1 5 Siam Mask Course Overview

    • 7. 2 1 How does Siam Mask Work Intro

    • 8. 2 2 Fully Convolutional Siamese Network

    • 9. 2 3 SiamFC and Siam RPN

    • 10. 2 4 Siam Mask

    • 11. 2 5 Implementation Details

    • 12. 2 6 Siam Mask Performance

    • 13. 2 7 Results of Siam Mask

    • 14. 2 8 Important Link

    • 15. 3 1 Environmental Setup Intro

    • 16. 3 2 What you will Need

    • 17. 3 3 Setup and GitHub Code

    • 18. 3 4 Anaconda Setup

    • 19. 3 5 Setup Python Environment

    • 20. 3 6 Running the Demo

    • 21. 3 8 Key Take away

    • 22. 4 1 Using your Own Dataset Intro

    • 23. 4 2 Siam Mask Execution Commands

    • 24. 4 3 Converting the Dataset into Images

    • 25. 4 4 Running the Demo on your own Dataset

    • 26. 4 5 Activity Test it on your own video

    • 27. 5 1 Training Datasets Overview

    • 28. 5 2 YouTube VOS Dataset

    • 29. 5 3 COCO Dataset

    • 30. 5 4 ImageNet Datasets

    • 31. 5 5 YouTube VOS Training Dataset Process

    • 32. 5 6 Step 1 Using the Correct Directory

    • 33. 5 7 Step 2 Downloading the Raw Image Dataset

    • 34. 5 8 Annotation Metafile Format Review

    • 35. 5 9 Dataset Post Processing

    • 36. 5 10 Step 3 Crop and Generate Data Info

    • 37. 5 11 Convert Raw Data to Summarised Training format

    • 38. 5 12 How to Repeat for Other Datasets

    • 39. 5 13 Activity Try Out your Own Datasets

    • 40. 6 1 Intro to Siam Mask Training

    • 41. 6 2 Why Use Test Data

    • 42. 6 3 Step 0 Downloading Test Data

    • 43. 6 4 Step 1 Download the Pre trained Model

    • 44. 6 5 Step 2 Training Siam Mask Base Model

    • 45. 6 6 Post Training Checkpoints

    • 46. 6 7 Overview of Checkpoint Testing

    • 47. 6 8 Activity Train you own Dataset

    • 48. 7 1 Testing SiamMask Intro

    • 49. 7 2 Various Options for Testing SiamMask

    • 50. 7 3 Option 1 Testing Checkpoints on VOT

    • 51. 7 4 Option 2 Best Model for Hyperparametric Search

    • 52. 7 5 Option 3 Tracking on your Own Dataset

    • 53. 7 6 Siam Mask Custom Model Testing Summary

    • 54. A1 Error Handling jitdebug

    • 55. A2 Error Handling CUDA

    • 56. A3 Error Handling NoneType

    • 57. A4 Error Handling checkpoint e9

    • 58. A5 Error Handling jq command not found

    • 59. A6 Error Handling NAN FPS

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

What Is Siam Mask

In this course you will learn how to implement both real-time object tracking and semi-supervised video object segmentation with a single simple approach. SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting the loss with a binary segmentation task.

Once trained, SiamMask solely relies on a single bounding-box initialization and operates online, producing class-agnostic(any class will work) object segmentation masks and rotated bounding boxes at 35 frames per second.

Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state-of-the-art among real-time trackers on VOT-2018 dataset, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017

Applications of Siam Mask

  • Automatic Data Annotation - Regardless of Class

  • Rotoscoping

  • Robotics

  • Object Detection and targeting

  • Virtual Background without Green Screen

What you will Learn?

You will learn the fundamentals of Siam Mask and how it can be used for fast online object tracking and segmentation. You will first learn about the origins of Siam Mask, how it was developed as well its amazing performance on real world tests. Next we do a paper review to understand more about the architecture of Siamese Networks with regards to computer vision.

Thereafter, we move on to the implementation of Siam Mask by setting up the environment for development so that you can run Siam Mask on your own PC or Laptop. Once that is working, we will show you how to train Siam Mask for your own custom applications.

Once trained, you will need a method in which to test your new model so that you can apply it for real world applications.

Why Should I Take this Course?

You should take this course, because Siam Mask is a State of Art Model that has robust accuracy and performance and can be used in a wide variety of applications.

Meet Your Teacher

Teacher Profile Image

Augmented Startups

Computer Vision, AI and Robotics


Augmented Startups

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Siam Mask Promo UD: Hey guys, In this tutorial series, we're going to explore what CMA ask is and how you can use it to track mass segmentations of an object. But the state of the art performance in terms of robustness, accuracy, and real-time frame rate. Now if you look closely at the demo, you can see how really well Samos works, even for an object that blends in with the background and deformed significantly in terms of perspective within the video. So to introduce this topic, I have Dr. Humira knew who has a PhD in computer vision and a post-doc from the Technical University of Munich. What majors in robotic vision and autonomous systems numerator will be your instructor for not only this tutorial series, but also for the comprehensive same mass cause this course deals with the implementation, dataset preparation, as well as training and testing for your own applications. So check it out. So yes, numero with cmos. 2. 1 1 Object Tracking Intro: Do you know that one of the fields of computer vision with the most room for growth is video object tracking. Yeah, that's true. Although tasks like object detection and classification have made huge progress when it comes to object tracking. Today's tracker still have a long way to go. So a lot of efforts are being diverted to object tracking because the applications are numerous. For example, we need object tracking for autonomous driving, surveillance, Human Computer Interaction, sports analytics, medical analysis, human behavior, understanding, personal robots, industrial robots, smart phones like the list is really endless. 3. 1 2 Single and Multi Object Video Object Tracking: When talking about tracking, we may be referring to single object tracking, where the aim is to track a single object. On the other hand, we can be talking about multi-object tracking, where we are expected to log on to every single object in the frame, uniquely identify that, and track all of them until they leave the free. If you're new to the field and are wondering what's the difference between object detection and object tracking is pretty straightforward. In object detection, we detect an object in a frame, put a bounding box on a mask around it and classify the object. That's basically it. That's the end of the job of the detector. A processes each frame independently and identifies numerous objects in that particular theme. Now an object tracker on the other hand, it needs to track a particular object across the entire video. If the detector detects, for example, in this case, seven guys in the frame, the object tracker has to identify the seven separate detections. And it needs to track it across the subsequent frames with the help of a unique ID. For a detector, it's just one object. God, for a tracker, these are seven different cars. 4. 1 3 Object Segmentation: A different but related problem is object fragmentation. Here we semantically segment each class separately and we go for a pixel level evening. This is needed for applications like video editing and video processing. And it makes us processing computationally expensive and very slow. So object tracking and segmentation being two very different problems. They pose a set of common challenges. Late occlusion, especially felt occlusion. Deformation, reading viewpoints of the cameras, non-stationary cameras. These are some challenges with both of them fits. So when researchers tried to solve these problems, they either follow the way of taking these two problem areas at independent and try to solve them independently. Or they tried to provide a common solution for both these problems. Some of the areas where solutions were presented were using Mindshift filters, optical flow, using superpixels, and more recently, Deep Learning and convolutional networks for you. 5. 1 4 Siam Mask Object Segmentation Tracking: One of the most recent and state of the art solution to both problems is CM mosque, published in May 2019. Film mask is a simple multitask learning approach based on fully convolutional Siamese networks. One streamed CMR solely relies on a single bounding box initialization and operates online producing glass agnostic object segmentation mask, and related bounding boxes at 55 frames per second. Yes, that's real-time. And despite its simplicity, versatility, and fast speed, fear mosque has been established as a new state of the art among the real-time trackers on VOT 2018 dataset. While at the same time demonstrating competitive performance on radio object segmentation does on David's 2016 and 2017. 6. 1 5 Siam Mask Course Overview: In this tutorial, we're going to focus on using cmath for video object tracking and semi-supervised segmentation. We'll see how it works. What's the underlying math's behind it? How do they achieve such a huge performance gain that to in real-time? And we'll get our hands dirty by training the cmath parcel and running it for our own datasets and applications. So if you are interested in working on applications like autonomous driving or crowd management, or one of the application area that we just discussed. Then this tutorial might be for you. 7. 2 1 How does Siam Mask Work Intro: So now let's talk about how does CME must work. To allow online operability and fast speed. The authors adopt the fully convolutional CMEs framework. More over to illustrate that that approach is agnostic to the specific fully convolutional vectored used as a starting point. They consider the popular CAM FC, NCAM, RPN, as do representative examples. They didn't adapt them to propose their own solution. The CMB mosque. 8. 2 2 Fully Convolutional Siamese Network: So here we see the fundamental building block of the tracking system and offline trained fully convolutional CMEs network. This comparison exemplary image against a larger search image x to obtain a dense response map. Then an x I respectively are W by H crop centered on the direct object, and a larger group centered on the last estimated position of the target. The two inputs are processed by the same CNN, yielding to feature maps that are cross-court electric. 9. 2 3 SiamFC and Siam RPN: This is what we see in this equation, g phi. Each spatial element of the response map, which we see as the left side of the equation, G5 is referred as response of a candidate renew ROW. For example, g phi n would encode a similarity between the exemplary and the end candidate window in x for c and f c, the goal is for the maximum value of the response map to correspond to the target location in the search video x. However, in cmos, the authors replace the simple cross correlation with debt twice cross-correlation and produce a multi-channel response map. Fem fc is drained off line on millions of radio frames with the logistic loss, which they referred as Helsinki. The performance of theme FC was improved by relying on a region proposal network, RPN, which allows to estimate the target location with a bounding box of variable aspect ratio. Cmr bn outputs box predictions in battle with classification scores. And these are referred as L box and L score. 10. 2 4 Siam Mask: In CME mask, the authors point out that besides similarity scores and bounding box coordinates, it is possible for the ROW that is thought of candidate window of a fully convolutional Siamese network to also encode the information necessary to produce a pixel values binary math. So they predict w by h binary masks, one for each ROW using a simple two-layer neural network, h t tau. Now, the MN here denotes the predicted mass corresponding to the nth response window. And the loss function and mask is for the prediction error budget done via a binary logistic regression loss over all ROW. And here is what it looks like. On the left, you see the fully convolutional neural network that generates the response of candidate window. And on the right, we have the neural network for mass prediction. Based on these calculations, the author is presented two readings. One combines the mosque with RB and better meters walks in school. And the other one combines the mask with L sin from FEM fc. In order to have the comparison against the tracking benchmarks, it is required to have a bounding box as final representation of the target object. They considered three different strategies to generate a bounding box from a binary mass. First, if axis aligned bounding rectangle, the min-max, which you see in red here. Next, they tried rotated minimum bounding rectangle and BR, which you see in green here. And the third one is the optimization strategy used for the automatic bounding box decoration proposed in VOT 2016. This is what we see in blue here. The author is short that the NBER strategy to obtain a rotated bounding box from a binary mask offers a significant advantage over popular strategy that simply report axis aligned bounding boxes. 11. 2 5 Implementation Details: A few words about the implementation details. For both the variance, the authors use a resonant until the final convolutional layer of the fourth stage of their backbone. They use exemplar and search image batches of one, 27 by 27, and 255 by 255 pixels respectively. For training and Newton tracking, the cmos is simply evaluated once per frame without any adaptation. 12. 2 6 Siam Mask Performance: Now let's have a quick look at their performance and results. The authors explain that the method aims at the intersection between the dust or visual tracking and radio object segmentation to achieve high practical convenience. However, in addition to tracking by a bounding box, it also generates the mask and achieve state of the art performance. Here, the two radiance of CM mosque are compared against seven recently published state of the art trackers on the VOD 2018 benchmark. The performance measure Yoast is E0, the expected average overlap, which considers both robustness and accuracy of a tracker. As we see here, cmos basically outperforms the previously proposed solutions. And here we see how cmos can be considered as a strong baseline for online BOX. First, it's almost two orders of magnitude faster than accurate approaches. And second, it is competitive with reason reuse methods that do not employ a fine tuning. We feed here as FP, while being four times more efficient than the fastest ones. Also, it doesn't need a must for initialization, which wifi by the end here. In the graph. There are some more reserves using mean intersection over union, the MIE IOU versus speed. And again, cmos data out here. And now let's look at some reserves. 13. 2 7 Results of Siam Mask: Start by drawing a bounding box, and cmos generates the mask and continuous tracking. And here is another example. This is interesting because we feel it working under very local trust. As you can see, that color of object and the background is almost similar and huge deformations. Another example where the viewpoint changes totally and it's still able to track. And something in very low light. It still works. And there are many more such examples. You can check them out at their website. 14. 2 8 Important Link: Finally, we have some important links. If you wanted to check out more of their reserves and how do they work? Check out the blog. They have their paper online, and you can access the code from the augmented startups report that GCM must be needed in the next session for experimental setup and our own dream. That's it for today. And if you have any questions, feel free to reach out to the next time. 15. 3 1 Environmental Setup Intro: Hello everyone and welcome back to the CMS studio, a series in which we are covering object tracking and when your segmentation using fear mask. In the previous section, we saw what CMS is, how it works, what is the underlying math behind it? And how was the performance gain achieved over the previous benchmarks. In this session, we are going to cover how we can set it up on our own machine and play a little bit with it. 16. 3 2 What you will Need: So let's get started. What you need is a GitHub Agon Because this is where the code resides. In case you don't have one, just feel free to create a free account rate. Now, as far as system requirements are concerned, the author of youth and go into machine. But Python 3.6 and the render code on GPUs. For the purpose of this demo or this tutorial, we use Mac OS also with Python 3.6, and we used an Intel Core I7. So in case you don't have access to hi-fi GPUs, it's not a problem. Your code would execute a little bit slower, but it could still run. 17. 3 3 Setup and GitHub Code: So dedicated, startling. It's a fairly simple process. You need to get the chord, so we'll clone the repository. We need to have the Python environment setup for that. We'll use Anaconda. And finally, we'll be good to go. So we'll run the demo. So first we get the goal. You can get the code from the augmented startups repository CM mask. So we come here and we get the goal. Once we have the coordinate hand, all we have to do is to go to the terminal and clone the repo. But before we do that, let's have a quick glance at the repository. And from the look of it, it looks like a very well-documented rippled. They have a bed and leave very clear-cut instructions on how to set up the environment, how to run the demo, how to train the system, and so on and so forth. So let's try it out. So we come to our terminal and we clone the report. For that. We use the git clone command and we paste the link that we just copied. Now we have the goal. We move to the right directory via mask and we exported right pumped. So we specify the theme mask variable to be equal to the present working directory. 18. 3 4 Anaconda Setup: Rate that's done. The next step is to get Anaconda and using for the installation. If you don't have it already, you can visit the Anaconda website there from products under individual addition, we scroll a little bit to the download section. And here we can download the installer of our choice. For our purposes of this tutorial, I am using the graphical installer. Of course, based on your preferences and your system, you can select the right in each other that suits you best. Now we have the installer downloaded. We have the README, which we accept license. We agree. Select the destination. The installation that we typically do is the standard installation. Once you are done with the installation, it will ask you to download some editor which you can fill up doing soil or not. And once you are done with it, it will provide you a summary of what was done. 19. 3 5 Setup Python Environment: Once we have the Anaconda setup on our machine, now we can use it to create a virtual environment for ourselves and proceed with the installation. So let's check out the documentation. We can close this. As for documentation, we use Conda to create the environment. Today, just do this. So conda create, specify the name of the environment. And you specify which byte inversion you want to use for this setup. If you have an environment with this name, feel free to override it. As a new developer. Maybe it's important for you to understand that having virtual environments and setting up all your machines in, in containers is always a good idea. Because this way you are multiple installations do not conflict with each other. Because imagine if you have multiple projects running in parallel and each one of them have their own requirements, then having them all under a global environment will be actually a mess you resolve. So it's always better to have separate environments for different projects and work on them. So we have our environment greedy recently activated. And now you can see that we have moved from the base to cmos environment. And here you can actually go ahead with the instillation. As we can see here. There are a number of requirements for this particular project, including, but not limited to cyclin, torch, matplotlib. We need open CV, we need Torch vision for that. And there are many more packages that needs to be installed. And imagine if you have to install each one of them by yourself. And if you have to take care of the different versions that will, then it becomes really cumbersome to handle. So it's always a good idea to have a package manager that can take over all of this load from you and then get your environment ready for you. And now we are done with this environment installation. So the final step in this process is to execute the bash make dot SH. And once that's done, we exported to the present working directory. 20. 3 6 Running the Demo: With this, we are done setting up our system. So let's say if everything has worked out correctly by running a little demo. So the authors have provided some demo programs and they have some data available in the experiment section. So we first moved to the right directory and execute some commands there. So as mentioned, they have their experiments under experiments folder. So we move to that folder. And under that we have a CM mosque sharp folder. Next, we need to download the models that they have pre-trained. For the downloading they are using w gate. You can use your own favorite Downloader. And if you still want to use w gate and you don't have it, you can use glue to install. And in case you don't even have Drew, you can download and install it through. So our first model is there. We download one more. The Python part we have exported already. Now we are good to go. Let's execute the demo dot py script, which they have provided. And as we can see, they are using the model that we have just downloaded along with a configuration file that they have prepared. Okay, so this opens up our video and we can define an initial bounding box to actually track segment. We can feed the object tracking via the green bounding box and the segmentation via the red mask. It seemed to be doing quite well. As we can notice, that there are huge motions and use the information in the shape of the person and it's kind of tracking them correctly. 21. 3 8 Key Take away: All right, with this, we come to an end to our today's session. We learned how to set up the cmath on our own machine and we've played around a little bit with the demo that they provided. Next time we'll actually see how we can train the model ourselves and use our own applications to try them out. Meanwhile, you can set up your system and if you have any questions or issues, feel free to get in touch with us. Thank you. And all the best. 22. 4 1 Using your Own Dataset Intro: Hello everyone. Now that we have seen must set up on our own machines, I'm sure just like me, you are also tempted to try it out on your own dataset. So let's just explore it further. 23. 4 2 Siam Mask Execution Commands: So here I have a street view of a junction in New York City. And I want to apply a mask on this video. To do that, let's have a look at the command that we use last time to run the demo. So it was a biotin script coin I murdered BY, and we pass it some pedometers, cmos Davis, which is the model and the configuration file config Davis. Let's have a look at the demo dot PY file and see what does it tell us. So we opened the file. It's two folders ahead. So here we have it. We notice that the script dates of your arguments and one of them is this part. And this one the forms to the tennis data-set. This is the one that we saw last thing. So let's have a sneak peek into how this dataset looks like so that we can format ours accordingly. The dataset is a collection of images. So let's convert our review also into a major so that we can proceed further. 24. 4 3 Converting the Dataset into Images: You can use the video converter of your own choice. I'm using a freely available tool called adaptive for Video conversion. You can download it from macro slash adapter. It's fairly simple to use. You simply select the file or drop it on the tool. Select some optional configuration. You can lead them to default and just hit convert. Once it performs the conversion, you are then ready to go. Now I have my images ready. I've seen them under a folder demo New York Street in the same data folder, Vader tennis files. So let's execute the command. 25. 4 4 Running the Demo on your own Dataset: So we run the script. And in addition, I saved my file down in the team leader folder under the lean demo, New York Street. As before, let's define the object that we need to try using a bounding box. So let's select this one. And heat engine seem to be doing a nice job on our data. Say, wow, that was easy, wasn't it? 26. 4 5 Activity Test it on your own video: I encourage you to try it out on your own applications and play around on your own videos. If you have any questions or any comments, feel free to reach out. 27. 5 1 Training Datasets Overview: Hello everyone. In this session, we are going to talk about the datasets needed for training the CMB mosque model. Cmos depends on four datasets, the YouTube us, cocoa, and two variants of ImageNet dataset. So let's have a quick look at each one of these in a little bit more detail. And then we'll move towards downloading and preprocessing them. 28. 5 2 YouTube VOS Dataset: The first one that we have here is the YouTube BOS, which is the first large-scale benchmark that supports multiple video object segmentation tasks. It comes with over 4000 high resolution YouTube reduce with over 314 minutes duration. It gives us 90 plus semantic categories and over 7,800 unique objects. And it provides us over a 190 thousand high-quality manual annotations. This is really a big value add. 29. 5 3 COCO Dataset: The next one is Coco, the common objects and context dataset. This is a large-scale object detection, segmentation and captioning dataset. Coco provides us 330 thousand images, out of which over 200000 are labeled. It gives us 1.5 million object incidences AT object categories in 1901, stuff categories. Cocoa gives us five captions per image and 250 thousand people with key points. 30. 5 4 ImageNet Datasets: And then we have two variants of the ImageNet, Large Scale Visual Recognition Challenge dataset. This challenge evaluates algorithms for object localization and detection and image classification from both images and reduce at large scale. So these are the two variants that we are using, the object detection and object detection using radio. 31. 5 5 YouTube VOS Training Dataset Process: So now let's move towards downloading our first dataset, the YouTube BOS. But before we do that, let's quickly have a look at how these datasets have been arranged, especially with respect to the folder structure under the repository. So let's have a quick look at that. So here we are at the CCM must repository on our web console. The datasets are arranged under the Data folder. And here you see the full radiations, the YouTube us. Then we have cocoa, and then we have the two variants, detection and v IID from the ImageNet dataset. So for downloading, the first step is to ensure that we are in the right directory, followed by the next two steps that is downloading the raw images and annotations and finally preprocessing them and preparing them for training. Now let's get to action. 32. 5 6 Step 1 Using the Correct Directory: So the first step is to go to the right directory. Let's check it out. In our repository cmos, we should be in the data folder. And here he have all the four datasets. We are downloading youtube. So here we are. And in our terminal, as we said, refers go to the Data folder. And here we should move to the YouTube folder for downloading the YouTube data set. All the commands that we need are also available in the README for the specific folder. So you can simply copy them from the assessment. 33. 5 7 Step 2 Downloading the Raw Image Dataset: So the first command is a Python script that will download the YouTube dataset from the Google Drive. For most of you, this command should work directly by engrave. It gives you an error, just like it's giving for me. Here is the reason behind it. Basically the problem is that the file is too big to be checked for viruses. And that's why there is an access denied. So all you have to do is to copy the link and download it directly from your browser. So let's do this. Here we have the link and we simply place it on our browser. Here we have the error that we just spoke about, that Google Drive cannot scan this file for viruses. You can click download anyway and start downloading. Just ensure that once the downloading is complete, you copy it into the U2 folder that we just saw. Now that we have the training data downloaded, we see that it's a zip file. So we unzip that. It will take a few minutes before all the data is uncompressed. And this is what we get when the training dataset is uncompressed. There are two folders for annotation and JPEG images, and a metafile, which is a JSON. Let's start with looking at what do we have in the metafile. 34. 5 8 Annotation Metafile Format Review: So as expected, there is some metadata in this JSON file. So beefy some radio information. So apparently each one is an ID, and under each radio ID we have certain objects. So for example, here we see something from category penguin. And these are the frames where this object appears, and so on and so forth. Now let's check this out. In the corresponding annotations and images folder. We will be looking for this first video with the ID W3, 2, 3, 4, 408 d. So let's check this out. First we check the images. It started with w 0, that's the ID. And here we notice that it's a set of images with many wins. Next, let's check out the annotations folder. Again, we go with theme ID. And here we notice that we have a set of masks and notations. Now that we know what's the data that we have downloaded, let's move ahead with the post-processing. 35. 5 9 Dataset Post Processing: Next we need to run the Python script bars wide tvOS, which will perform some conversions. So let's get started. Fighting. By Tiberio S dot BY. The authors explain that this is a really slow process. So we'll just let it run for some time. Alright? It took almost two hours for the script to finish execution. And as a result, we have two extra files. Now. One of them is the instances string dot JSON and the other one is instances validation dark J2. Lexie what we have in each one of them. So some interesting information here. We see we have our renewal ID, the one that we checked out before, W3, 2, 3, double 400 AD. Then apparently in ID for Penguin 138, NBC, some bounding boxes, Apparently H and W, the height and width of the boxes with respect to the frame numbers and area under the books. Okay, that looks good. 36. 5 10 Step 3 Crop and Generate Data Info: Now we are ready to execute the third and final step of this pre-processing, which is cropping and generating data info. We need to execute the command by to run a script bar crop. And we pass it two arguments, the crop size 5, 11, and the number of threads quit. This was again. And then the process that generated a new folder, crop five, 11. And let's see what do we have in here. We have a group of folders. Let's try our old id, three to three. And here we see a set of images and some mosques. So these are the crop images that have been generated by using the new masks that we have. And here we notice that we have different mass for each of the bandwidth separately. And accordingly, the images have been cropped. 37. 5 11 Convert Raw Data to Summarised Training format: And now we are ready to execute the final step of the process. We run the command by Jen Great, Jason Lord. By this step basically takes the Sofar raw or processed data and converts it into a summarized form that can be used in the next step for dreaming. Let's have a quick look at the streamed or JSON file and see what's in there. And as expected, we have here the training data in us, properly summarized form. We see video ID and the object along with the different frames and the presence of objects in those different trains. 38. 5 12 How to Repeat for Other Datasets: Alright, now that we have completed the processing of YouTube dataset, we can now move on to the rest of the three datasets. The steps for each one of these are essentially the same. So for example, for cocoa, you again have to go to the right directory. You download the images, you unzip them, and you do some processing for generating the final output. The same goes for object detection based dataset and the object detection using video games dataset. Once again, you need to move to the right directory. You download the dataset, and you do some post-processing, and you repeat the same for all of them. Please note that this is a long process. Some of the video files are actually 86 or 49 GB large, and it may take easily two to three days for you to complete this whole process. However, yeah, that's true that it was a lengthy process. But this is also true that it's a straightforward one. 39. 5 13 Activity Try Out your Own Datasets: So I encourage you to go ahead download the dataset and preprocess them so that we are ready in the next few days to start training RCM model. If you have any questions or comments, feel free to reach out to us. Thank you very much. 40. 6 1 Intro to Siam Mask Training: Hello everyone and welcome back to the cmos do to you. In this session we are going to see how to train our own model and then do some testing on it. What we have already done so far is that we have set up our machine. We have downloaded the datasets and we have post-process them. 41. 6 2 Why Use Test Data: So now we are ready for training. And the first step that we need to do here is downloading test data. I've mentioned this as step 0 because you might be wondering why we need test data for training. So basically the authors have read the code in such a way that it needs less data for some evaluation and refinement while it's generating the model and training the model. So you might want to skip this step because in the README it's specified as a separate step. But it's also possible that at a later stage, you'll get stuck and your script will stop working because it's missing this test data. So go ahead, you can try it out by skipping this step. But I suggest that we first download the test data before proceeding further. 42. 6 3 Step 0 Downloading Test Data: So firstly moved to the data folder. We need to install jQuery. You can use pip or Brew or based on your operating system, apt-get. I'm using grill. Now that you have it, Let's go ahead and start downloading the data. So we do it by running the Bash for mine for getting desk data dot SH. And while it's downloading, let's have a quick look at the script and see what it's doing. And as expected, the script is basically downloading the datasets. For example, here we see the VOD dataset being downloaded and then some data from Davis and so on and so forth. So we let it run. A note of caution here. F themes and reports from people trying to download these datasets that sometimes some files are missing because GQ was not working properly. So although your machine might see the GQ is properly installed, but it's also possible that it is not actually properly installed. It's missing out on some commands. And it's also possible that the script won't let you know that there is an error and it will keep on working. Ultimately, you will get stuck in one of the following steps because the data is missing. So my recommendation would be that once you are done with any step of this project, just scroll back through your terminal window and make sure that there are no obvious blatant errors in there. Alright, so now this is done. So now let's move to the next step. 43. 6 4 Step 1 Download the Pre trained Model: So the next one is downloading the pre-trained model. And for that, we need to move to the experiment folder and start downloading. So let me clear the screen. And now we can move to the experiments for loop up. And then we download the model. And once that's done, we need to go to the CMS sharp folder. And here we download some more modules. Okay, that's done. So now we move on to the next one. Basically the authors keep these models with them and use them in case the training bricks. And they need to resume the training. Then these models come in handy. We need to do one more, which is the Davis dataset. So now that we have the pre-trained models, let's move ahead to the next step and start execution. 44. 6 5 Step 2 Training Siam Mask Base Model: And here we are ready to go. Let me clear the screen first. And then we move to the C M mass B's repository to go one step up from here. Now we are in cmos based and we execute the command Randolph is age. So now our programs job execution because of the error that it caught an unexpected keyword argument jet debug. We won't go into too many details. Object debug that it's just in D9 debugging. But the problem can be resolved in one of two ways. One of them is that you upgrade your biotin to 3.7 or above. This problem is found in Python 3.6 or below. So upgrading it to 3.7 or above, resolve the problem. The other option is that we downgrade the LLVM Lite library. I find this solution as an easier one, so I go for it. So now I'm installing a specific version of LLVM IR light, which is 0.32.1. And then we'll see how we proceed. So now we have hopefully the right version of LLVM light. So we execute the bash again. So it's running. It started with checking the system configuration. Loaded our configuration files, loaded the training data, loaded, the validation data, loaded the parameters needed for the convolutional neural network, the CNN, build up the model. And apparently it has started dreaming. On a GPU. It will take over 10 hours to train this model. And without a GPU, it will take even longer. I strongly recommend that if you want to train your own model, then cleave have a system with GPU, otherwise it will take too long to finish training. Also, you may encounter several errors when you are drained to start training. I have included a separate session on the popular errors that we encounter in this project. For example, the debug error that we already saw or the GQ gamma1 not found error. And then there are some popular areas which you'll find out because your GPU was not there. So cuda library was not loaded or some error in calculations returning not a number as an answer and so on and so forth. So please check out the error handling session in gives you encounter any of the errors. And if you don't find any specific answer or solution to your problem, feel free to reach out. 45. 6 6 Post Training Checkpoints: Okay, so once you are done with the long training process, you'll see a folder in cmath base directory. This folder is called a snapshot. And under this folder are all the models that have been generated by the script. You see files from checkpoint underscore y1 to yk grantee. These correspond to the 28 box that we had specified in our configuration. N. You see an additional best dog Bth file, which is basically selected by the script on the basis of certain threshold values, the best one of the models. So this basically brings us to the end of the training. We are now ready to test our models either by using all of them to get an average performance or 10 of them, or the best one of them. It depends on you. Which one do you want to use all or one for your next steps? 46. 6 7 Overview of Checkpoint Testing: If you're more interested in the numbers, then you may want to test your checkpoints on a certain dataset. And there are two ways to do it. Either you run your dataset for all the checkpoint, as in the first case. So here we specify as one which of the starting point and 0s, 20, the ending point for your checkpoints. Or you can select one of the models here, retake the best dirt Bth, and run your benchmark dataset, the VOD 2018 on only that one model. So it basically up to you how you want to check it out. On the other hand, if you are more interested in trying out the model on your own test video and want to see it in action. Then simply follow the process that we saw in the demo execution. And simply use the model that you generated here instead of the default models that we used by downloading from the repository. 47. 6 8 Activity Train you own Dataset: I hope this helps and I hope you enjoy training and testing of CM masked model. Please try it out. Of course, if you have any problems or questions, if you want to know any more details, feel free to reach out. Enjoy. 48. 7 1 Testing SiamMask Intro: Now that we have trained RCM model, it's time to test it. And as discussed before, we have multiple options to do them. If you are more interested in the numbers and checking the performance of your system, then you should go for either option 1 or 2. And if we are more interested in trying it out, then you should go for option 3. Let's see how each one of them is different. 49. 7 2 Various Options for Testing SiamMask: So in option one, we specify that we want to get all the grantee checkpoints. So for example, from S1 to eat 20. And we run the test script on that. This will loop through all the checkpoints and apply the VOT 2018 dataset on all of them and give you some performance metrics. The second option is that you specify one of the models to as specified by the parameter m. We take the snapshot best PTH. You can pick up whichever checkpoint you want and apply the testing on it. Again, we are using the VOT 2018 dataset here. You can pick up the dataset of your own choice for testing. This will also give you some numbers based on the performance metrics. The third option is more about checking out the model in action. So recall, as we did in the demo execution here as well, we can provide a dataset, specify which model we want to use. So instead of one of the pre-downloaded models that we use last time, we can specify one of the models that we generated now. For example, the best dog Bth. And let the model run on our review. Basically, we'll be able to interact with the radio, define our own ROI, and it will track that object. So let's see each one of these in action. 50. 7 3 Option 1 Testing Checkpoints on VOT: So in option number one, we want to test the system for all the checkpoints that we have generated. We specify this by specifying the starting point, the S better meter to be one, and the ending point, the IEP pedometer to be number 20. So a loop will execute from one till 20 and basically pick up all the checkpoint. It will apply the dataset VOT 2018 on these checkpoints and generate some numbers. So it will load the checkpoints. This is going to be a long process and it will again take many hours to finish. So be patient. 51. 7 4 Option 2 Best Model for Hyperparametric Search: In case you are more interested in checking out the best one or any one particular checkpoint. You can also specify, instead of the F and E pedometers where it takes up a loop and picks up all the checkpoints. You can specify using the M, the model that you want to pick up. So it will take up that model and run the code on that exact mode. As before, it loads the pre-trained model loads the better metered or the image of that it needs. And start testing. As before, this is also a time-consuming and computationally expensive process. So you need to be patient with this step as well. 52. 7 5 Option 3 Tracking on your Own Dataset: If you are more interested in running the model to track your own objects in your own video. Then this third step, or the third option is for you. Recall that we could run the demo from the CMB mask sharp folder. So let's move first to that one. Now we are in the shop folder. To run the demo, we need to execute the demo script, which is under the Tools folder. So we specify the folder and the script name. Under Tools demo dot p-y. Next, you need to specify the model which you do using the resume argument. Our models are currently in the cmath beef folder under snapshot directory. So we specify that. And you can pick up the model that you want to choose. Next, we need to specify our configuration file using the config argument. And finally, we need to supply our desk images. We do that using the parameter and specify the folder where your images are located. We have placed them in the current folder under the title beta strings. We go ahead with the execution. It loads up our model and loads the test review. Here we can specify as before, the region of interest. So the object that we want to track led to this one. And we hit Enter. So now it's tracking our own dataset using the model that we have just generated. Seems pretty cool. 53. 7 6 Siam Mask Custom Model Testing Summary: This brings us to the end of the testing module. We saw how we can use our train system in a variety of ways to generate numbers. You get the performance of our system and also to use this model for our own test datasets. Try them out and if you get stuck or have any questions, feel free to reach out. Thank you. 54. A1 Error Handling jitdebug: As we work on this layer mask project, it's inevitable that you'll encounter some errors. These errors may occur because of missing libraries or missing files. Maybe there are some missing configurations and wrong configuration and so on and so forth. In this session, the error that we want to consider, which is well-reported, is NoneType returned. You will encounter this error when you are training the model. And more specifically, when you are in the CMB mask underscore base folder and executing the command bash runs scored a speech. This adder is typically encountered because if there is an image in the dataset which has not read correctly and the return type for the image is none. Then when the script tries to perform some operation on to it, which it can't, it generates this error. And you feel the message that greater than is not supported between instances of NoneType an integer. The solution to solve this problem is simple. Removed the images which are causing problems. Do that, you navigate to the experiments and then Cmax underscore beef folder, open the configuration file, the conflict or Jason. And I recommend that you keep only the white DB underscore VUS dataset and remove the wrists. If you have more adventurous type, then you may want to cherry pick the files that are causing the problem and remove only then you can try that out as well. I hope that helps. And if you still encounter problems, feel free to region. 55. A2 Error Handling CUDA: As you work on this CAN mass project, it's quite inevitable that you encounter certain errors. These errors could be because of a missing library, your configuration, some missing file, and so on and so forth. In this session, we are specifically going to cover the editor of unexpected keyword argument. To debug. You may expect this error when you are training your model. More specifically, when you are in this team must base directory and you are executing the command bash run toward a stage. Then you may find this better. And the error would be type error, create target machine got an unexpected keyword argument to debug. This problem can be resolved by a number of ways are recommended solution is that you downgrade the LLVM Lite library from And this should fix your problem. Alternatively, you can also upgrade your fighting to version 3.7 or above. That would take away this problem. 56. A3 Error Handling NoneType: As we are working on the CMS project, It's inevitable that we'll encounter some errors. These may happen because of missing libraries. Missing files are some inaccurate configuration. In this session, we'll have a look at the missing checkpoints E9 PTH file. You will see an error, assertion error. That snapshot slash checkpoint underscore E9 dot PTH is not a valid file. You may also encounter similar errors with other checkpoints like Check Point E1, E2, E3, and so on and so forth. Most probably you will encounter this error when you are training the model. And more specifically, when you are in the CMB must be Folder and executing the bash granddaughter FH for mine. You can also encounter this error when you are testing the model by using the combine. And basically you encounter this error as an assertion error that the checkpoint underscore E9 BPH file under snapshot is not a valid file. The solution live in changing a little bit of the parameters that we pass through that unbarred SH. In that file, you notice that there is an argument called resume snapshot slash Check Point underscore E9 BPH. Simply remove that line. Basically this snapshot is used by the authors in case we want to review the training. If it was broken in the middle of the execution. Since we are starting from scratch and this file does not exist at the moment. So trying to resume using this while we'll create an error. I hope this helps. And if you still feel free to reach out. 57. A4 Error Handling checkpoint e9: As we are working on the CMS project, it's inevitable that you'll encounter some errors. These may occur because of some missing files or missing libraries or some inaccuracy is in the configuration. In this session, I want to bring to your notice an error which you may not instinctively notice, which is the GQ command not found error. This is typically encountered when you are running the bash get desk data dot SSH command under the data folder. This is when you are downloading your desk data. The problem is, when this error is encountered, the script does not break and it continues to the next step. Which means you want proactively see this error GQ command not found our didn't unless you deliberately look for it. So what I recommend here is that when you are at the stage where you have to run the command git desk data. And there is this instruction to install energy q. Before running that command, I recommend that you make sure that this command is executed and the installation is correct. Also, please ensure when you are running the gate desk data command that there are no such errors like GQ gamma1 not found during your execution. 58. A5 Error Handling jq command not found: Now that you are working on the CMS project, it's inevitable that you'll encounter certain errors. These may be caused because of missing libraries. Missing files are some inaccuracies in configuration. In this session, I want to bring to your attention the trickiest error that I have encountered in this project, which is the mean speed, not a number FBS error. I call a tricky because number one, you find it in several places. Unexpected places like testing, when you are running the test commands to test your model, even when you are training and implicitly calls a testing module. Second, the source of error is actually not in the current step, rather one of the previous steps. Let me explain. So the error that you'll receive is a runtime warning me advice empty. And then followed by that means B, not a number frames per second. The reason of this error is missing data. But the problem is that the missing data is not downloaded in the current step, rather in one of the previous chips. A bigger problem is that the missing data was caused because of a missing library and the script dent stop when that error was encountered. So solution here is whenever you are downloading the test data, when you are executing the command get test data, you will notice that it needs a command or a library GQ. Please make sure that you have this GQ library installed properly. And it's actually indeed correctly installed because it's also possible attacks make it even trickier that the system may say that the library is there, but it cannot find it. So just ensure that you use the right command, lake on Ubuntu, you use the apt-get in Mac, you use the blue common for downloading G2. So please ensure when you run the test data script, it completes successfully. And not only it completes successfully, there are no intermediate errors in middle because this script doesn't stop when there is a GQ command not found error, and it will simply continue. And when you need that data, the following script will actually break because it won't find the data and it won't be able to do the calculation, then ultimately it will give you not a number as an answer. I hope this. And if you've done problems, feel free to reach out. 59. A6 Error Handling NAN FPS: As we work on the CMS project, it's inevitable that you'll have certain errors. These errors could be record of missing libraries, missing files, some inaccurate configuration, and so on and so forth. In this session, we want to see if your system cannot initialize CUDA because of some reason. You will encounter this error when you are training your model. And more specifically, you are in the CMB must base folder and executing the command bash run dot SH. This error typically occurs if you do not have a GPU. And most probably if you are running your script on a Mac, you would find this error. And the message that you get is that you cannot initialize CUDA without 8 and CUDA library and so on and so forth. The solution here is to specify that the device to be used should be CPU instead of CUDA. And here is a recommended solution. Basically, you want to tell your system not to use the GPU rather CPU for execution. And you need to basically specify this information in 13 files. One of the files would be in newtons folder that loader Harrisburg, where you can specify the device to be CPU. And also in the training files like in tools under dream, underscore, CM mask. And all relevant files. Simply remove all the instances where CUDA is being mentioned. So do steps. Change device, you see CPU, and second, remove all instances of cuda in the script. Then you would execute and it should run properly. If it still gives you error, feel free to reach out to us.