Computer Vision 101: Let's Build a Face Swapper in Python

Alvin Wan, Research Scientist

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Lessons in This Class

- 1.
  
  Introduction
  
  1:16
- 2.
  
  What is Computer Vision?
  
  5:04
- 3.
  
  Code OpenCV Basics
  
  10:08
- 4.
  
  How do Face Swaps Work?
  
  4:12
- 5.
  
  Code Face Detection
  
  9:59
- 6.
  
  How do Face Detectors Work?
  
  4:59
- 7.
  
  Code Face Swap
  
  12:40
- 8.
  
  Next Steps
  
  0:33

Beginner level

Intermediate level

Advanced level

All levels

1,248

Students

Project

About This Class

How does a face swap work? What is computer vision?

In this course, we will explore computer vision fundamentals as you build a Snapchat-esque face swap application; for those unfamiliar with Snapchat, this filter will detect pairs of faces and swap them, making for a silly effect. This class was made for beginners that have a little familiarity with machine learning and coding. If you aren’t already familiar, that’s okay, these two courses will you caught up quickly:

To learn Python, I suggest taking my Coding 101: Python for Beginners course.
To learn the basics of AI and machine learning, take my Artificial Intelligence MasterClass: Tools to Master Machine Learning course.

We’ll cover many topics and takeaways:

Build a Face Swapping application
What computer vision is
Break down face swapping AI products into ML problems
Break down face detection ML problem
Understand taxonomies of computer vision problems
Computer vision concepts like filters, feature extraction, detection etc.

By the end of this course, you’ll have a face-swapping application for you to play with. Show it off to your friends and family!

Still not sure if this course is for you? Try this short 4-minute video. If it sparks your curiosity, this course is definitely yours to take!

If you plan to run this code locally on your own computer, make sure to follow these Installation Instructions before starting the lesson.

Interested in creative coding? Check out my VR 101 (AFrame Nature Scenes) class.

Interested in more data science or machine learning? Check out my Coding 101 (Python), SQL 101 (Database Design), or Data 101 (Analytics) classes.

Follow me on Skillshare to be the first to hear about more courses in these areas!

Meet Your Teacher

Alvin Wan

Research Scientist

Top Teacher

Hi, I'm Alvin. I was formerly a computer science lecturer at UC Berkeley, where I served on various course staffs for 5 years. I'm now a research scientist at a large tech company, working on cutting edge AI. I've got courses to get you started -- not just to teach the basics, but also to get you excited to learn more. For more, see my Guide to Coding or YouTube.

Welcoming Guest Teacher Derek! I was formerly an instructor for the largest computer science course at UC Berkeley, where I taught for several years and won the Distinguished GSI (graduate student instructor) award. I am now a software engineer working on experimentation platforms at a large tech company. 4.45 / 5.00 average rating (943 reviews) at UC Berkeley. For more, see my Skillshare or Webs... See full profile

Related Skills

Computer Photography AI for Photography

Level: Intermediate

Hands-on Class Project

How else can you leverage this application’s face detection? Perhaps “face-swap” with a random celebrity? Share your versions of the application or even just silly face-swapping pictures by clicking on “Your Project”.

Not sure if this course is for you? Try this short 4-minute video. If it sparks your curiosity, this course is definitely yours to take!

If you plan to run this code locally on your own computer, make sure to follow these Installation Instructions before starting the lesson.

Class Ratings

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Here's a face swap. You've seen this in photos, videos, snaps, you name it. It's silly, fun, and sometimes damn right creepy. You can swap faces with your friend, a celebrity, your baby, or even your dog. Let's build this face swap. Hi, I'm Aldo, a Computer Science Lecturer and PhD Student at UC Berkeley. I help computer see, studying computer vision for virtual reality and self-driving cars. I've cut over 15,000 students and I can't wait to show you the magic too. In this class, you'll learn the basics of computer vision. I put this class for anyone interested; coders, designers, business leaders, anyone, if you don't know Python, no problem. Get caught up with my Coding 101, Python for beginners course. Don't know AML? That's also okay. Take my artificial intelligence masterclass tools for machine learning. By the end of this class, you'll understand images, image processing, face detection, and more. You'll also have a fully functioning face-swapping desktop application that can swap faces for pictures framer the parts. This will take you just one hour; nothing to install, no complicated setup. You just need a laptop with internet. You'll walk away with a face swapper and computer vision know-how in no time. I hope you're excited because I know I am. Let's do this. 2. What is Computer Vision?: Let's start by answering, what is computer vision? Here's the definition. Computer vision, broadly defined is the eyesight for AI. There are a number of computer vision tasks that reflect this, like object detection, which boxes and classifies all objects in the scene like the flower on the left. Super resolution, which hallucinates details to make your images sharper, like the sharp color on the right, contrasted with the blurry bushes on the left. Keypoint estimation which identifies key points like joints and limbs, like the antique dancer on the left. We can now re-answer what is computer vision in an applied way? Computer vision is more specifically extracting information from or generating images. To better understand this field though, let's look at the framework for learning machine learning which we covered in our AI masterclass. If you haven't already taken this course, I recommend doing so. In our AI masterclass, we discussed compartmentalizing our ML knowledge into four categories. Data, model, objective, and algorithm. Data describes the inputs and outputs, what we learn from and what we predict. Model describes how to make predictions. Objective describes the goal, what the model is optimizing for. Finally, algorithm describes how the model learns. We haven't discussed algorithm much, and we'll skip it again this time around. In computer vision, the data is always visual, namely images and videos. We may additionally use other related signals like audio or depth on top of images and videos. We'll focus on the data in this lesson. Our models in classical computer vision and in deep learning today extract patterns from the image using a tool called filters. We'll discuss this in later lessons. Finally, our goal is usually to maximize accuracy. As usual, we'll skip the algorithms. To understand the rest of computer vision, we need to understand how images are represented. What is an image? How is an image represented as numbers? Let's look at an example. We can construct a black and white image using numbers where zero corresponds to black and one corresponds to white. Focus on the dividing line between ones and zeros. What shape do you see? Saving this matrix of numbers as an image gives us this. It turns out it's a diamond. What if we want grayscale images, not just black and white? Well, still using zero for black and one for white, we can also use any value between zero and one such as 0.1,0.26, or 0.74391. Number's closer to zero are darker and numbers closer to one or lighter. This allows us to represent white, black, and any shade of gray. Consider the following, for example, can you tell what this is? Again, each number corresponds to the brightness of a pixel. Saving this box of numbers as an image gives us this, a pokey ball. This one's impossible to see just from the numbers in the previous slide, but you now know how black and white and gray scale images are represented numerically. To introduce color, we need a way to encode more information. Here's how. First, each image has a height and width. This image is h by w. Each pixel, as we saw before in a gray-scale image has one value. We can equivalently say that our image has dimensions H by W by one. Upgrading this grayscale image to a color image involves the following. For a color representation, we represent the color of each pixel using three values between zero and one. One number corresponds to the degree of red, one to the degree of green, and the last to the degree of blue. We call this the RGB color space. This means that for every pixel in our image, we have three values, R, G, and B. As a result, our images now H by W by three. That's how you get a color image like this one. In reality, each value ranges from 0- 255 instead of 0-1. But the idea is the same. Different combinations of numbers correspond to different colors, such as 171, 180, 190 full light blue, or 200, 199,188, full light brown. In summary, each image will be represented as a box of numbers that has three dimensions, height, width, and three color channels. Manipulating this box of numbers directly is equivalent to manipulating the image, and that's how images are represented as numbers. In summary, we defined computer vision as extracting information from or generating images. We covered ML abstractions for computer vision, data, model, objective, and algorithm. Finally, we discussed how images are represented numerically. With all this knowledge, you are well-equipped to begin working with images encode. For a copy of these slides and more resources mixture checkout this URL. That concludes this intro. Let's get coding in the next lesson. 3. Code OpenCV Basics: Welcome to the first coding lesson of this course. Let's get our hands dirty with some code. If you'd like a quick introduction to programming in Python, make sure to check out my Python for beginners course before starting this lesson. You can pause the video here to access that URL. At a high level, we'll generate an image then we will try our webcam. The goal is to explore fundamental OpenCV utilities. Start by accessing this URL. This will create an environment for us remotely so that we don't have to do any setup on our computers. You'll see on the right-hand side, I've already done this. For this tutorial, I highly recommend using Google Chrome. Unfortunately, I've tested the code in a few different browsers and only Google Chrome is supported at the moment. Your goal will be to generate this image. Step one is to create the numeric representation of the image like this. Recall from before that ones and zeros can make a black and white image like we discussed, zero here is black and one is white. However, in reality, the numbers range from 0 to 255, so we'll use 255 for white and zero for black. Let's create this array in code now. On the right-hand side, I'm going to click on the X, well, I'm here on the top right so we can close this preview window. On the left, click on New File and type in generate.py. This will create a new file for us. Let's now minimize this file browser. I'm going to now zoom in on my code so you can better see what I'm doing. To start, in your generate.py, import numpy, which contains your linear algebra utilities. By convention, we renamed numpy to np for convenience later on. Then import OpenCV, which contains your computer vision and image utilities. We'll first create a numeric representation of our image. Go ahead and type in image equals a numpy array and this numpy array will take in a list of lists which specifies the numbers that we have on the left-hand side. Typing those numbers now. After inputting the numbers, define a datatype. This datatype is going to be an unsigned integer. This data type here is important. We set the data type to an unsigned integer because all of our integers in this image are positive. This is required for OpenCV or cv2 to successfully save your array as an image. Next, we will resize our image. Here is how to resize, call cv2.resize on the image to resize and pass in the desired size of the image. In this case, our image's tall. So we want a width of 90 pixels and a height of 150 pixels. But if we run this code, we would be sorely disappointed. Our up-sized eight would look ugly like this. Instead, we want a sharp eight. So we'll up-size using another resizing technique called nearest neighbors. To do that, we'll add a third argument to our resize function. Again, here's resize. We pass in the image to resize the final size and the third argument tells cv2 to use the nearest neighbor up-sampling technique. Let's try this in code now. We'll now write, image is equal to cv2.resize image comma 90, 150. Then finally the interpolation method, which is cv2.INTER_NEAREST. You'll see unfortunately that my code is cut off on the right-hand side. This right here says interpolation is equal to cv2.INTER_NEAREST. With our image resized, all that remains is to save it. To save the image, use cv2.imwrite, as shown here. The first argument is the path, and the second argument is the image. Let's try this. After your existing code, type in cv2.imwrite will provide the path of the image and the image itself and that's it. We're now ready to run this code. We've finished writing our code in a Python file called generate.py. In previous Python lessons, especially those on Repl.it, we would hit the green run button at the top of the screen to run the Python file. We don't have a nice green button, so we'll do this manually and run our Python file through the command line. To launch the command line, reopen your file browser on the left-hand side by clicking on the arrow. Once you do that, you'll see a tool's dialog on the bottom left, click on Tools, and from the drop-down, select Terminal. In that Terminal, you'll see some setup like I see here. I'm going to zoom out slightly. Once you see this terminal, you can now input your command. In particular, the command is Python and the argument is the file path, which is generate.py in this case. Go ahead and type in Python generate.py and hit Enter. After you run your code, you should see an 8.png in your left sidebar. Click on it to view your generated image and voila, you've generated your first image. Let's now start manipulating the webcam outputs. Navigate to this URL. Once you've loaded the page, close the preview on the right-hand side, by clicking on the X in the top right. On the left-hand side, we'll now create a new file. In this part of the lesson, our goal is to connect our OpenCV utilities to the webcam. Let's start by launching a minimal web app using a custom library built just for this course called Web OpenCV. On the right-hand side, click on New File and type in app.py. Here I'm going to minimize this sidebar by clicking on the arrow in the top left. In this part of the lesson, our goal is to connect to our webcam using the OpenCV utilities provided in this lesson. Let's start by launching a minimal web app using a custom library built just for this course called Web OpenCV. Import webopencv as WCV. We'll then import OpenCV for our general computer vision utilities. So import cv2. Next, we'll instantiate your web application. You can do this by typing in app is equal to wcv.WebApplication. Finally, you can run your web application, app.run, and that's it. To preview your new application, click on Show in the top left. For me, I've minimized my screen so much that I can only see a pair of sunglasses. Click on In a New Window. This is my new window. Unfortunately, the top is cut off, in this new window, click on Show and then click Allow. Once you're done, click Stop to stop the webcam. If you're concerned about privacy, don't worry, the data on that webcam is only being communicated from your computer to your own glitch server, which is what you're coding in right now. So only your code and the server is processing your webcam. No one else sees it. You are now the developer. Now add your very first image transformation to the web app. This transform will ultimately write text on the image it receives. But for now, let's go to transform that doesn't do anything. Create a function called Hello. It takes in two arguments, the image and another object called the frame, and returns the image. All transforms except images as input and return the processed image. We also need a decorator called app.transform. We will skip over the technical details of how a decorator works. For now, just know that this decorator registers are transform with our web app. Any registered transform will show up in our web interface. Additionally, the text in pink will be used as the transforms' name. Let's try this now in code. Underneath where your app is defined, add your brand new decorator, with the transform name Hello. We're now going to apply a keyword argument, default is true. This ensures that the Hello transform is automatically applied when the webcam is loaded. Define your hello function, which takes in the image and another argument called the frame, and finally, return the image. Now if you refresh the page in a new window, you'll see the Hello transform is chosen by default. Here we have Hello. Next, we'll write some HelloWorld text onto the webcam live feed. Here's how to do that. Here's the image to annotate. The text that we want to show, the position of the text, the font that we want to use, and the font size effectively. This is actually given as a scale relative to the default font size. Finally, we have the color in RGB. So here 255, 0, 0 means red. In the Hello function will now add text to the image right above where you return or image type and cv2.putText. We'll pass in the image, the Hello World string, the position, the font. We'll give it a font size of one. Finally, we'll use the color green. Here, we'll use 0, 255, 0. Now, navigate to your web app. Click Start, and you'll see the Hello World text applied. There we go. Here are the steps we covered. We covered OpenCVs, image writing, and text adding utilities. No need to memorize these. You can always look them up. I just wanted to give you some practice for working with these utilities. For a copy of these slides, the finished code, and more resources, make sure to check out this URL. That's it for the OpenCV basics. 4. How do Face Swaps Work?: Let's dig into face swaps. Something that looks like this. How does this work? Let me explain. We'll start by breaking down this face swapping AI product into sub-problems. What is a face swap? In our simple version of the face swap, a face swap takes two steps, detect all faces and swap pixels for those faces. Let's talk about these two steps in more detail now. First step, the face detection, summarized face detection takes in a picture like this one and draws boxes around faces. Let's describe those box drawing algorithms in more detail now. We'll introduce face detection by describing the associate data, model, objective, and algorithm. As a refresher, here's what each of these terms mean. First, data. Our face detection model accepts an image and predicts a box that contains the face. These boxes are represented using four numbers. Each box description includes the top-left corner, x and y, and the height, and width. These four numbers uniquely define a box. A face detection model predicts these four coordinates for a face. Second model, the particular face detection model we'll use is called a Haar cascade classifier. We'll discuss this model in more detail in a later lesson. Third objective, our model's goal is to maximize face detection accuracy. To measure accuracy, we use a metric called intersection over union or IoU. Let's talk about how IoU is calculated. Here's how IoU or intersection over union works. Say this red box is our predicted box, the blue is our ground truth. Intuitively, the less these two boxes overlap, the lower accuracy should be. The more they overlap, the higher our accuracy should be. We compute the overlap denoted in pink to mix your bigger boxes to automatically get higher accuracy. We also divide by the union denoted in green. This intersection over union or IoU is how we measure accuracy for object detectors. Finally, as usual, we'll skip over the algorithm. This concludes our introduction to face detection. We'll discuss face detection in more detail in a later lesson. For now, let's move on to the second step of the face swap, the swapping itself. The second step is a pixel swap, we'll need to do some resizing in case the two detected faces are different sizes, but this is otherwise pretty straightforward. Now, let's see an example of these two steps together. First, detect both faces and then swap them. Admittedly, this doesn't look convincing. For this class we'll build this simple face swap, but let me explain how industry grade face swaps make this more realistic. Let's redo the face swapping AI product breakdown. This was our previous understanding of the face swap. First, instead of just detecting faces, will now detect key points in the face. Here's a visual example of facial key points. Facial key points may correspond to meaningful parts of the face, like the nose bridge, area right above an eyebrow, dimples and more. With our key points, we then warp pixels so that each region of the face is warped to the corresponding region of a second phase. To visualize this, say we have two faces now, we have the facial key points from before for the person on the left. We also have the same facial key points for the person on the right. Let's focus on the three key points around their left eye. These key points form triangles, to face swap will warp the triangle on the left to the triangle on the right. After some image blending techniques, we'll then have a photorealistic face swap, something like this. That's it. We covered a minimal version of the face swap that will build, which consists of two steps, detect the faces and swap the pixels. We also covered the professional face swap techniques used by popular apps for a copy of these slides and more resources, make sure you checkout this URL and now you know how face swaps work. Let's get coding once more to start building our own face swap. 5. Code Face Detection: In this lesson, we'll experiment with a face detector. At a high level, we'll detect faces in an image, and then we'll detect faces in our webcam. The goal is to explore space detection utilities. Start by accessing this URL like I've done on the right-hand side. This will create an environment for us remotely so that like before, we don't do any setup on our computers. On the right-hand side here to get ready for our development, I'm going to close this preview in the right-hand side by clicking on the x in the top right. In the file browser, I'm going to click on New File, then type in detect.py. Then this is automatically going to open detect.py in our editor. I'm going to minimize our file browser by clicking on this left arrow. Then I'm going to zoom in so that you can better see what I'm doing. We'll start by reading and writing a single image. To read the image, use cv2.imread as shown here. The first argument is the path of the image to read. Let's try this, in your new Python file, start by importing OpenCV, import cv2. Then read the image, kids.jpg. Here, we'll type in image is equal to cv2.imread and kids.jpg. To save the image, use cv2.imwrite, as shown here. The first argument is the path and the second argument is the image. Let's try this now, write the image to a new file called out.jpg, cv2.imwrite to out.jpg, and then include the image to right. Then we're going to reopen our file browser and in the left-hand side, click on Tools at the bottom and then click on Terminal. Wait for the setup to finish. Once setup is done, you'll be greeted with a prompt like this one. Here we're going to type in python detect.py. Hit Enter. This will run the Python script you just wrote. The script will read kids.jpg, and save that to out.jpg. To check, open out.jpg on the left. Here, you'll see that out.jpg matches kids.jpg. Let's now instantiate our face detector. If you'd like a refresher on what instantiation is or what objects are, make sure to check out my intermediate-level object-oriented programming class. You can pause the video here to access that URL. Instantiate the face detector, and pass in the model parameters. Here, the model parameters are stored in a file called parameters.xml. In your file browser, click on detect.py once more. Here I'm going to again close my terminal on the bottom by clicking on the x. I'm also going to minimize my file browser. You don't need to do either of these things. I'm just decluttering the video so that you can better see my code. Right above the image, we're going to now instantiate the face detector. Detector is equal to cv2.CascadeClassifier. Again, the argument is parameters.xml. Let's now use the face detector to detect faces. We'll do this by using the detectMultiScale method, pass the image to this method, and additionally pass a new keyword argument called scaleFactor is equal to 1.3. We'll discuss what this scaleFactor means later. For now, we'll pass both of these arguments and the method will then return a bunch of rectangles, each rectangle corresponding to a face in the image. Let's try this now. After you've defined your face detector and after you've loaded in the image, we'll now detect all faces by running, rectangles is equal to the detector.detectMultiScale, and like we have written there on the left, we're going to pass in the image and the scaleFactor. Finally, let's draw rectangles on the image corresponding to the detected faces. Here's how to do that. Call the cv2.rectangle function. Here we'll pass in the image to draw rectangles on, the coordinates for the top-left corner of the rectangle, coordinates for the bottom-right corner of the rectangle, the color of the rectangle's border. Remember, the first number here represents the amount of red, the second one the amount of green, and the last one the amount of blue, with the amount ranges from 0 to 255. As a result, 0, 255, 0 means green. There is a subtlety here. The color scheme has not actually RGB but BGR for OpenCV, but we're omitting that detail for now. But if you try changing these colors, that would be why the first number actually controls the degree of blue. Finally, we'll define the line width for the rectangle's border, which is two pixels wide. Let's now try this in code. On the left-hand side, we're going to loop over all the rectangles. Here, we know that rectangle is a tuple of four numbers like we talked about in the previous lesson. We can now write x, y, w, h is equal to rectangle. This syntax allows us to assign x to the first number, y to the second, w to the third, and h to the fourth. Now, draw the rectangle using the function we discussed. Cv2.rectangle, the image, the starting coordinate, the end coordinate, the color green, and finally the border width. If you haven't already, open the file browser on the left, click on Tools, and select Terminal. On the bottom, you'll see some setup. Once your terminal is ready, type in python detect.py in the terminal. This runs the face detection Python script you just wrote. After running this script, click on out.jpg on the left, you'll see the image now has rectangles drawn around each face. We will now repeat face detection but for our own webcam. Start by accessing this URL. Once your page is loaded, just like before, minimize your preview by clicking on the x in the top right. Now, we've already got a file created for us. This is from the previous lesson. I'm going to close the file browser on the left-hand side by clicking on the left arrow. To start, we'll change the frame rate of our web-based video feed. This prevents our web application from lagging too much. Add a keyword argument frame rate to the constructor like this. In our code I'm going to type in framerate equals to 5. Additionally, we're going to delete cv2.putText, we'll replace this later. Like before, we'll instantiate the face detector. Again, like before, pass in the model parameters at parameters.xml into this CascadeClassifier. Right above the definition of the app, I'm going to type in detector is equal to cv2.CascadeClassifier parameters.xml. Next, detect all faces in the image. We're configuring the face detection slightly, like before passing the image to detect faces on and like before additionally pass in scaleFactor is equal to 1.3. This scaleFactor enables us to detect larger-sized faces. Here's how, say, the face detector was trained to detect faces of this size. Here the blue square is the image, the circle is an abstract representation of a face. During inference, a larger face like this one would normally be missed since our detector is not trained for faces this large. To get around this, we scale down the image by 30 percent and run the detector on it. Then repeat this, scale down the image by another 30 percent and run the detector on it again. In this last step, our face during inference is the same size as the faces during training so our detector is able to detect the face. This is what detectMultiScale means. Let's code this now. First, let's rename this transform from Hello World to Find Faces. We'll also rename the function to find_faces. Next, find all faces in the image like we did before. We'll also add the scale factor like we mentioned earlier, to make the detector more robust to different face sizes, rectangles is equal to detector.detectMultiScale, and we're going to pass in the image and a scaleFactor of 1.3. Finally, draw rectangles around all the faces like before. Here is again, how to draw a rectangle in OpenCV. Let's do this now. We're going to loop over all rectangles. We're going to destructure the rectangle into four variables like we did before. Finally, we're going to draw rectangles around all faces. Now, note that we can actually simplify this for-loop. Since x, y, w, h is equal to rectangle, we can actually substitute rectangle with these four variables. I'm going to now delete this line. Here we have for x, y, w, h in rectangles. Now, click on Show in the top left. For me, unfortunately, my window is too small, so it looks like a pair of sunglasses. This will open up a preview of your web application. I'm going to again zoom in so that you can see. I'm going to then click on Start and you'll see a webcam feed except with your face boxed. Again, click Allow if you need to. That's it for this lesson, you've now explored the face detection utilities in OpenCV for a copy of these slides, the finished code, and more resources, make sure to check out this URL. 6. How do Face Detectors Work?: Let me take a step back to explain how face detectors work. We'll start with detecting simple features like edges. Take our image from previous lesson. Say we want to extract simple small features like edges. On a high level, we want to find these small features in every possible patch of the image. Let's say each patch is two-by-two to start. We'll start from the top-left and ask, did we find the feature here? What about here? What about here? So on and so forth until you've covered the entire image. Now, how do you find small features like edges in each of these two-by-two patches? Consider a two-by-two patch with an edge and other patches without edges. Now consider their numeric representations. Recall that black is zero and one is white, so the left box contains both zeros and ones. The middle box contains all ones and the right box contains all zeros. Let's now consider a two-by-two filter, which is just a two-by-two matrix of numbers. Multiply the two-by-two filter with our two-by-two patch. Element-wise, multiply the red negative 1 by the red zero. Multiply the black negative 1 by the black zero. Multiply the green ones, and multiply the blue ones. Finally, add them all together and we get two. Do the same for the middle image and we get zero. Do the same for the right image and we get zero again. This is perfect. Our filter produces positive values for vertical edges and produces zero for images without edges. This is just for a small two-by-two patch though. Let's now consider the entire image. This is now the numeric representation of our diamond image. We'll traverse over every two-by-two patch in the image. At each patch, we multiply and sum the two-by-two filter with a two-by-two patch. This gives us a matrix of outputs where one denotes an edge with black on the left and white on the right. Negative 1 denotes an edge in the reverse direction. Visualized as an image, we have white as left edge, black as a right edge, and gray as no edge. We call it this convolving a filter with the image. This works for generic color images too. Here's a bird on the left. Convolving the image with the edge filter gets us the image on the right, highlighting edges as expected. For today's face detector, we'll use Haar filters or Haar features. Some Haar features find edges like the one we just tried. Others find lines, yet others find abstract patterns. There are multitudes of possible filters and there are also many, many patches in a large image. Running all filters over all patches is expensive. Let's make this more efficient. We need a way to save computational cost. To do this, the face detector in this course uses a method called cascading. To understand how cascading CVS compute, we need to understand the intuition. Look at that image in the red box. It is basically monotone green. It definitely doesn't contain a face because it's all one color and fairly boring. Knowing this, we could run a simple as detector first to find all edgeless parts of the image. Convolving an edge filter with the image gives us this output. Notice that this monotone part of the image boxed in red on the left is all black. We can now ignore those parts of the image and focus later filters on the more interesting parts of the image denoted in green, and that's the intuition. Run one small set of filters of the image. We call this stage 1. Determine which parts are interesting by picking the pixels with the highest output values. Then run the next set of filters on the interesting parts. Refine which parts of the image are considered interesting, and repeat for the next set of filters, refining which parts of the image are interesting again and continue doing this. You could repeat this indefinitely. In our face detection model today, the model uses 38 such stages. These cascaded filters then allow us to perform this detection. Specifically, that final stage of the 38 stages will output high values for faces. Draw a box around values in the image that exceed a threshold, then visualize on top of your original image, and there you have it, a successful face detection. Let's recap the steps. In summary, we discussed how to extract simple features like edges using filters. We then discussed how those simple features are used to iteratively find interesting parts of the image using a cascade pattern. Finally, the cascade produces high-value outputs for faces. In the final stage, we draw a box around high-value outputs, and these boxes identify faces in the image. For a copy of these slides and more resources, make sure to check out this URL. Now that you know how face detectors work, let's wrap up and finish coding our face swapper. 7. Code Face Swap: Now we have code to process our webcam and to detect faces, we're ready to finish the face swap. At a high level, we'll follow the same two-step process as before. First, we'll test the face swap on an image then we'll implement the face swap for your webcam. Start by accessing this URL like I have on the right-hand side. This will create an environment for us remotely, again, so that we don't want to do any setup on our own computers. Before a new function, we're going to create a new file. On the left-hand side here, if your file browser has collapsed like mine, go ahead and click on that to expand it. On the far right-hand side, I'm going to close my preview by clicking on the top-right x. In this file, I'm going to click on detect.py, and that is the file we wish to edit. Go ahead and close the file by clicking on this left arrow in the top left. In this file, we're going to create a new function, swap faces to hold your face detection and box drawing code. We're going to define a function called swap faces above our current code. This function is going to take in the image and the face detector. Inside we're going to take this code starting from the detection code all the way down to the box drawing code. I'm going to cut that and then paste it in our new function. Once you paste it, you're going to need to adjust the indentation. For me, lines 6 through 10 needs to be indented one more time. Now, we're going to pair up the rectangles, or in other words, the faces into groups of two. Let me explain how we'll do that. Pretend the rectangles list contains just numbers like this. Then rectangles of 2::2 will select every other number. In this case, we would have 0, 2, 4. Next, rectangles, 1::2 will skip the first one, then select every other number. In this case, we would have 1, 3, 5. Indexing is a pretty complicated topic so don't worry if you don't understand this fully, just know that this is conceptually what's happening. Finally, zip will collect the first item from both lists. In this case, 0,1 then it will collect the second item from both lists, in this case, 2,3. Finally, the third item from both lists making 4,5. All this to say that complicated-looking expression, zip rectangles with a bunch of columns and numbers, select every two rectangles or every two faces. Let's code this now. In your code on the left, we're going to change this for-loop so that we have for rectangle1, rectangle2 in zip, and then the expression that we had before rectangles 1::2, and comma. I'm going to hit Enter and here I'm going to write, pass and delete our original box drawing code. Let's now extract both faces from the image. We're going to start by defining a helper function. Underneath swap faces, we're going to define get face selector. This function will, given a rectangle, return a mask for us to select the face. First, we'll destructure the rectangle into the four values, just like we did before. Here will return then a slice object. A slice object allows you to select a part of the list. In this case, two slices allow you to select part of a 2D array or in other words, a part of the image. Here we'll return two slice objects. The first slice object will slice from along the first dimension or the y-axis from y to y plus h. The second slice object will slice along the second dimension or the x-axis from x to x plus the width. Now we're going to use this get face selector function, above, using our new helper function, convert the first rectangle to a face mask inside of our for-loop. Here we'll write mask1 is equal to get face selector rectangle1. Repeat the same thing for the second rectangle. Now, use the masks to select both faces from the image. Face1 is equal to image of mask1 and face2 is equal to image of mask2. Now, since the two faces may be different sizes, we need to resize each face to fit within the other. Here's conceptually how we'll do that. Say we're trying to fit the green box inside of the blue box, notice that the blue height is less than the green height. Since the green box is taller, we'll shorten it so that the heights of both boxes are the same. In this case, the ratio between the blue height and the green height tells us how much we needed to shrink at the green rectangle. Say now that the green box is wider than the blue box, notice the blue width is less than the green width. We'll shrink it so that the widths of both rectangles are the same. In this case, the ratio between the blue width and the green width tells us how much we needed to shrink the green rectangle. We don't actually know if the rectangle is too tall or too wide so we take the minimum of both ratios to be safe. This ensures that the green rectangle has shrunk enough, whether too tall or too wide, to fit inside the blue rectangle. Let's now code this. Just like before we're going to start off by defining a helper function underneath get face selector to find, resize to fit. This function will take in two faces, face1 and face2. First, destructure the face shape to get the width and height of the face. We'll type in face1_height, face1_width, underscore is you've got to face1.shape. The face actually has three dimensions for its shape. If you recall from our previous lesson, all images are h by w by 3. We don't need that last dimension, so we use an underscore. Underscore is convention for ignore this variable. We're also going to repeat this with the second face, face2_h, face2_w is equal to face2.shape. Then compute the required amount to shrink the first face to fit into the second face. This is the factor that we mentioned before. Factor is equal to minimum between the face2 height divided by face1 height, and then the face2 width divided by the face1 width. Finally, we'll resize face1. Here, the resize function we'll take in the first face and the second argument is if required argument though we won't use so pass in none for now. Then we'll type in the factor to scale on the x dimension, and then the factor to scale on the y dimension. Finally, return this. Now we'll use the helper method, resize to fit, to resize both faces to fit onto the other. Start with face1 up above in the for-loop will type in resized1 is equal to resize to fit face1, face2. Do the same for the other face resize2 is equal to resize to fit face2 onto face1. Now we'll actually paste each face onto the other. Conceptually will simply paste the green rectangle onto the blue one. Again, we'll start with the helper method. I'm going to scroll down and underneath resize to fit we'll define, make sure to backspace so that you start a new function outside, we'll define apply, which takes in the resized face and the face to paste it on. Destructure the face shape into its height and width resized1 height, resized1 width and underscore to ignore that third dimension is equal to resize1.shape and then paste the face onto face2. Here face2, we'll now paste the resized face onto face2. Finally, return the second face. Again, indexing is a fairly complicated topic so if you don't understand what this line means, don't worry. For now, you just need to know this is how he pasted the resized face onto the second face. Scrolling back up to the for-loop we're now going to use apply, to apply each face. Paste the resized face2 onto face1. Mask1 is equal to apply resized2 face1. Repeat the same thing for the other face image mask2 is equal to apply resized1 face2. For the final step, we're actually going to apply our swap faces function to our image and the corresponding face detector, scroll down to the very bottom of your file after you've instantiated your detector and read your image in. We're now going to call swap faces on the image and the face detector. Now we are done with the script. Click on the file browser on the left-hand side, click on Tools, and then terminal. Wait for a setup. Once setup is done, type in Python detect.py. This will run your face swapping code. After running this script checkout out.JPEG on the left-hand side. Looks like the image didn't change maybe, well, look closer. The kids faces are actually swapped. This is more obvious when I switch between the original image and the swapped one. Here is the original image and here is the swapped one. Let's now apply face swapping to our webcam. Start by accessing this URL. Once this page has loaded, I'm going to click Close on the preview on the right-hand side, I recommend doing the same to create more space. On the left-hand side, I'm going to close the file browser by clicking on the arrow in the top left. To start, let's copy all the helper functions we wrote in the last step. I'm going to paste them right above our face detector instantiation. Here I'll paste these in. Note it doesn't really matter where you paste these functions. If you do want to paste in exactly at the same location, I've pasted it after my imports, but before the face detector is instantiated. Let's now define a new transform for our web application to apply to the webcam live feed. Scrolling down after all of our existing functions, I'm going to create a new transform. Like in a previous lesson, we'll use the web open CV decorator at app.transform. We'll give this transform a name of face swap and we'll also make this new transform the default by typing in default equals to true. I'm going to delete this default equals to true down below because we can't have two default transforms. We'll now define a face swap function that takes in both the image and the frame, then call the swap faces helper method you wrote in the last section. Type in swap faces and pass in both the image and the face detector and return the image. Finally, that's it. Click on Show in the top-left. For me it looks like had sunglasses icon. Then select in a new window, this opens up a preview of your web application. I'm going to zoom in so that you can see what's going on and then click Start. In this case, I'm face swapping myself with a picture of myself taken in the past. Make sure to click Start and then Allow. That concludes our face swap. You can now share this face swap with your friends and family. Make funny faces, share so many pictures and have fun with your own face swap application. Good work completing this lesson, it was a long one. For a copy of these slides, the finished code, and more resources make sure to check out this URL. See the next lesson for next steps to learn more computer vision. 8. Next Steps: Congratulations. You've finished your face swapping masterpiece. We covered how images are represented, how to extract meaning from images, face detection models, and more. If this has picked your interest in machine learning and computer vision, follow me on Skillshare to get notified when the next class launches. If you're interested in data science topics as well, check out my Data Science 101 class; the play with data, or the SQL 101 class to query and design databases. Thank you for joining me for this face-swapping class. Congratulations once more making it to the end of the course and until next time.

Computer Vision 101: Let's Build a Face Swapper in Python

Alvin Wan, Research Scientist

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Introduction

1:16

2.

What is Computer Vision?

5:04

3.

Code OpenCV Basics

10:08

4.

How do Face Swaps Work?

4:12

5.

Code Face Detection

9:59

6.

How do Face Detectors Work?

4:59

7.

Code Face Swap

12:40

8.

Next Steps

0:33

About This Class

Meet Your Teacher

Alvin Wan

Related Skills

Hands-on Class Project

Class Ratings

Why Join Skillshare?

Learn From Anywhere

Transcripts

Computer Vision 101: Let's Build a Face Swapper in Python

Alvin Wan, Research Scientist

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Introduction

1:16

2.

What is Computer Vision?

5:04

3.

Code OpenCV Basics

10:08

4.

How do Face Swaps Work?

4:12

5.

Code Face Detection

9:59

6.

How do Face Detectors Work?

4:59

7.

Code Face Swap

12:40

8.

Next Steps

0:33

About This Class

Meet Your Teacher

Alvin Wan

Related Skills

Hands-on Class Project

Class Ratings

Why Join Skillshare?

Learn From Anywhere

Related Classes

Transcripts