Machine Learning 101: Python Computer Vision for Beginners | Alvin Wan | Skillshare

Machine Learning 101: Python Computer Vision for Beginners

Alvin Wan, AI PhD Student at UC Berkeley

Machine Learning 101: Python Computer Vision for Beginners

Alvin Wan, AI PhD Student at UC Berkeley

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
8 Lessons (27m)
    • 1. Introduction

      1:09
    • 2. Getting Started

      3:30
    • 3. Abstractions

      2:09
    • 4. Images

      3:21
    • 5. OpenCV

      5:05
    • 6. Face Detection

      5:05
    • 7. Face Swapping

      6:10
    • 8. Next Steps

      0:50
24 students are watching this class
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

262

Students

--

Projects

About This Class

In this course, we will explore machine learning fundamentals as you build a Snapchat-esque face swap application; for those unfamiliar with Snapchat, this filter will detect pairs of faces and swap them. Along the way, you will use a pre-trained face detector and will learn related concepts to kickstart your study of machine learning, centered around four abstractions: model, data, objective, and algorithm.

No prior machine learning--statistics, mathematics--background is required for this introductory lesson. However, we make extensive use of Python, a desktop webcam, and related computational and computer vision Python packages.

Interested in creative coding? Check out my VR 101 (AFrame Nature Scenes) class.

Interested in more data science or machine learning? Check out my Coding 101 (Python), SQL 101 (Database Design), or Data 101 (Analytics) classes.

Follow me on Skillshare to be the first to hear about more courses in these areas!

Meet Your Teacher

Teacher Profile Image

Alvin Wan

AI PhD Student at UC Berkeley

Top Teacher

Let me help! I'm a computer science PhD student at UC Berkeley, where I've taught for 5 years. I've designed a few courses to get you started -- not just to teach the basics, but also to get you excited to learn more. Check out the courses below! Or scroll down for a guide to getting started.

Website | Github | YouTube | Twitter | Research

 

Featured Reviews

"Alvin Wan is a fantastic teacher. The instruction format was just what I was looking for. This is fun due to the format... Due to Alvin's teaching method I'm not only grasping the content I'm having fun learning."

Rick M., Coding 101: Python for... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

phone

Transcripts

1. Introduction: You've probably seen a face swap at some point on social media, on mobile app, your favorite source of memes and silly pictures. In this class, you're going to learn just how that face swap works by building it. Hi, I'm [inaudible]. I'm Computer Science PhD student at UC Berkeley, where I primarily study computer vision for self-driving cars. On campus, I've taught over 5,000 students and spent two years as a machine learning student instructor. In this class, I'm going to take you through building a face swapping desktop application. This class does expect that you know your Python fundamentals. If you don't already, make sure to check out my coding 101 class. However, no prior machine learning experience is needed. I'll walk you through all related machinery topics from the ground up. First, we'll walk through application and terminology, starting off with the basics of a computer vision Python library. Second, you'll apply a pre-trained model to detect all faces in a live camera field. Third, you'll swap all faces for a picturesque frame worthy live [inaudible]. All told, this will take you about an hour, tops. You'll walk away with a face swapping application and machine learning know-how in no time, let's get started. 2. Getting Started : Before getting started with the Face Swapping application, we'll discuss machine learning basics, some terminology and applications. This is Lesson 2: Getting Started to Machine Learning. What exactly as machine learning? What is the distribution? Machine learning is a set of techniques to find patterns in data. Applications range from self-driving cars to personal air assistance, from translating between French and Taiwanese, to translating between voice and text. There are a few common applications of machine learning that already or could potentially permeate your day-to-day. For example, Detecting Anomalies in data. This could mean highlighting abnormal bank activity. Second, recommending similar content. For example, finding products you may be looking for, and Skillshare tutorials that are relevant. Third, predicting the future. For example, predicting the path of neighboring vehicles. These are a few of the many applications of machine learning. But most applications tie back to learning the underlying distribution of data. A distribution specifies events and probability of each event. For example, with 75 percent probability, you buy an item $25 or less, with 24 percent probability, you buy an item priced between $25 and $100. With one percent probability, you buy an item more expensive than $100. Using this distribution, we can accomplish all of our tasks from before. For example, we can detect anomalies. $100 purchase occurs with one percent probability, so any one $100 purchase, can be considered a red flag. With a distribution of data, we can accomplish a myriad of tasks. In some, one goal in machine learning is to learn this distribution. Even more generically, our goal is to learn a specific function with particular inputs and outputs. We call this function our model. Our input is denoted by x. Say our model, which accepts input x, is f of x equals to ax. Here "a" is a parameter of our model. Each parameter corresponds to a different instance of our model. In other words, the model where "a" is equal to two, is different from the model where "a" is equal to three. In machine learning, our goal is to learn this parameter, changing it until we do well. How do we determine which values of "a" do well? We need to define a way to evaluate our model for each parameter a. To do this, to find the output of f of x to be our prediction, y hat, we will refer to y as our label, meaning the true and desired output. With our predictions and our labels, we can then define a loss function. One such loss function is simply the difference between our prediction and our label. Using this loss function, we can then evaluate different parameters for our model. Picking the best parameter for our model is known as training. Beyond training the right parameter, there are plenty of other challenges. How do we control a self-driving car? What does it mean to train a model that identifies faces? That's it for our Getting Started with the machine-learning. Next time, we'll discuss four abstractions in machine learning. These abstractions will help you compartmentalize different machine learning topics as we encounter them in these lessons and beyond. These will also give us answers to the questions that we asked before. 3. Abstractions: There are countless topics in machine learning at various levels with specificity. To better understand where each piece fits in the larger picture, consider a few abstractions. This is lesson 3, abstractions in machine learning. The first abstraction is applications and data. Consider the possible inputs and the desired output for the problem, what is your goal? How is your data structure? Are there labels? For example, the goal is to classify pictures of handwritten digits, the input is an image of a handwritten number, the output is a number. The second abstraction is the model. The model is the class of functions we consider, for example, we could use all functions of the form f of x equals to ax squared, or we could use all functions of the form f of x equals to ax plus b. Are linear functions sufficient? Quadratic functions, polynomials? What types of patterns are we interested in? Or neural network is appropriate? All of these questions belong in the model abstraction. The third abstraction is the optimization problem. This is our objective in mathematics. The optimization problem answers questions such as, how do we define loss? How do we define success? Are there imbalances in the data our objective needs to consider? For example, we may wish to find the value x that minimizes ax minus b quantity squared. The fourth abstraction is the optimization algorithm. This is how we will solve the optimization problem. Perhaps we can compute the solution by hand, perhaps we need an integer of algorithm to approximate the solution. Can we convert this problem to an equivalent but easier to solve objective and solve that one? One possible solution is to solve the problem using calculus. Take the derivative, set, the derivative to zero and solve for optimal parameter. These are all elements of the algorithm abstraction. In this tutorial, you've touched a major topics in the fundamentals of machine learning. Using the objections above, you now have a framework to discuss machine learning problems and solutions. In the next lesson, we will discuss how images are represented in math. 4. Images: Let's explore how images are represented numerically. This will give us the background needed to modify images, and ultimately to affect our face swapping. This is Lesson 4, representing images as matrices. Let's look at an example. We can construct a black and white image using numbers, where 0 corresponds to black, and 1 corresponds to white. Focus on the dividing line between the ones and the zeros. What shape do you see? You should see a diamond. We can then save this matrix of values as an image. This gives us the following picture. Definitely a diamond. Now, what if we use any value between 0 and 1, such as 0.1, 0.26 or 0.74391. Numbers closer to 0 are darker, and numbers closer to one are lighter. This allows us to represent white, black, and any shade of gray. This is great news for us because we can now construct any grayscale image using 0, 1, and any value in between. Consider the following. For example, can you tell what it is? Again, each number corresponds to the color of a pixel. This one is far harder. Re-rendered as an image, we can now tell that this is in fact a Pokeball. You've now seen how black and white, and grayscale images are represented numerically. To introduce color, we will need a way to encode more information. See our image has dimensions h by w. We traditionally think of an image like this, an h by w rectangle. However, recall our representation of the grayscale image above. Each pixel is one value. We can equivalently say our image has dimensions h by w by 1. In other words, every x, y position in our image has one value. For a color representation, we represent the color of each pixel using three values between 0 and 1. One number corresponds to the degree of red, one to the degree of green, and the last, to the degree of blue. We call this the RGB color space, since each pixel needs three values to represent it. Our image is now h by w by 3. This means that for every x, y position in our image, we have three values, R, G, and B. In reality, each number ranges from 0-255, instead of 0-1, but the idea is the same. Different combinations of numbers correspond to different colors, such as dark purple or bright orange. The takeaways are as follows. Each image will be represented as a box of numbers that has three dimensions; height, width, and color channels. Manipulating this box of numbers directly is equivalent to manipulating the image. Two, we can also flatten this box to become just a list of numbers. In this way, our image becomes a vector. Later on, we will refer to images as vectors. Now that you understand how images are represented numerically, you are well equipped to face swap yourself and your friends. In the next lesson, we will write a simple Python script to connect to your webcam. 5. OpenCV: To build this face swapping application, we'll need a few handy computer vision tools. Many of these tools are provided in OpenCV, a Python library built for computer vision. Welcome to lesson 5, I'm getting started for OpenCV. In this lesson, we will write two scripts. The first is the test CB2 utilities and solidify our understanding of an image as a matrix. The second is to test open CBs connection to your computers webcam. Let's get started with a new Python file, which I will call generate.py. We'll start with a few conventions. First, I'll add a one-line docstring to describe the contents of the file. Second, I'll define a main function. This is important because any code in the global scope is run when the file is loaded, even when importing this all from another script. To prevent main functionality from running on import, we place all code in this main function. Finally invoke the main function if the script is called directly. At the very top of your file, import both the computer vision library and numpy in linear algebra library. Now using numpy, the linear algebra library, define a matrix of values. This matrix of values will contain ones where the image is white and zero where the image is black. Now to convert this image into a proper one, we'll need to rescale all these values so that the numbers are between zero and 255. Next, resize this image so that it's much, much larger than just a three by six image. Here we'll add an additional keyword argument, interpolation. This forces us to rescale the image in a way that preserves pixelated edges. Finally, invoke an openCV method, which will write this image to disk. Go ahead and navigate to your terminal, run your new generate.py, and this will output a diamond.png file, which you can view. For the second of two scripts, create a new Python script, main.py. We will start again with our three conventions, a doc string, a main function, and finally, a main function invocation when the file is run directly. At the very top of your file, again import our openCV library. Inside of our main function, we'll begin by initializing the camera. This value is zero, indicates the webcam for your computer. Next, loop forever until otherwise. Then, capture your video output frame-by-frame. Ret means return code, this tells us whether or not the call to cap.read succeeded. Frame is the image that is returned from our camera. Next, display the resulting image. The first argument is the name of the window that will pop up. The second argument is the image we wish to display. Now, we need to wait a millisecond for this image to show before reading the next image. Here, we're going to double-check if the user has pressed the key Q. If so, close the window terminating the while loop. This completes our script. Again, navigate to your terminal and type in Python main.py to launch your new webcam application. 6. Face Detection: This is lesson 6, face detection. To the right is a test image I have picked. In the next few minutes, we will use a cascade classifier to detect all faces in a test image of your choice. Fortunately for us, the parameters of this cascade classifier have already been trained and made available via OpenCV. We will simply use those pretrained parameters. To start, download pretrained parameters from OpenCV. Find any image on the web containing at least one pair of faces. Here, I have downloaded an image from pixels.com and named it test.jpg. Finally, create a python file called detect.py. Ensure that all three files, parameters.xml, test.jpg, and detect.py are in the same directory. Again, begin by formatting your file with the typical conventions, including a docstring, a main function, and a main function invocation. Before your main function, import the cv2 library. In your main function, begin by initializing your cascade classifier. The first argument and the only argument to this classifier is going to be a path to your parameters. Next, read your image from disk using cv2.imread. Next, using cascade.detectMultiScale, detect all faces in your image. This will return a list of rectangles. The first argument is your image, stored in the variable frame, passing an additional list of hyperparameters that I have tuned for you in advance. Next, iterate over all rectangles. Draw the rectangle onto the image using cv2.rectangle. The first argument is the image. The second argument is the starting position of your rectangle. The next argument is the ending position of your rectangle. Next is the color that we will use, here, we will apply the color green. Finally, is the width of the line used to draw your rectangle. To end this function, we will write your image back to disk. The first argument is the name of the file, which we will call out.jpg. The second argument is the image itself, and here we've concluded detect.py, save your file, navigate back to terminal, and run your script. This will create a new image, out.jpg, containing the original image, test.jpg with detected faces. Next, we have pulled up main.py from the last lesson. We will combine the current script running face detection on a static image with our connection to the web camera, this way, we can run face detection on the live real-time camera input. First, copy cascade, this definition of your cascade classifier. Navigate to "main.py" and paste the cascade classifier initialization right after the camera initialization, back into "detect.py", copy all of your code running "cascade.detectMultiScale" on your image and include code where you draw the rectangle onto your image. For us, this is line's 10-14. Back in "main.py", paste this code right after extracting the image from your camera. Now, save your file, navigate your terminal, and run main.py. This time, face detection will be run real-time live on your camera input. This concludes lesson 6, face detection. In the next lesson, we will continue to work with this camera and apply face swapping. 7. Face Swapping: Your application now detects all faces. For our last step, we will affect face swapping. This is lesson 7, face swapping. To start a major doc string. This should now say tests face swapping for static image. We'll begin by defining a few helper functions. The first of which is to resize one image to fit another, we'll take in two arguments, the image that we wish to rescale and the target image, which we wish to fit the image onto. Here, we'll extract the image dimensions for both. This third argument or this third element of the tuple is the number of channels in the image, for now, we'll ignore this value. Now, we need to compute the amount to resize our image by, this is going to be the more constraining dimension between the height and the width. Finally, re-size your image. Note that these dimensions must both be integers, otherwise, our resize function from cv2 will give us a warning. An additional quirk to note in cv2.resize, is that our image dimensions will need to be passed in as width and height. Note that before to extract image dimensions, we use height and width. Using cv2 resize, we finally resize our image. This concludes our resize to fit function. Now, define another function to apply the face to a target image. This will accept two arguments, the face and the target, like before, extract the dimensions for your face. Now, duplicate your target image. Finally, add the face to the target and return the target with your face. Now, we'll define a function using these helpers to actually affect the face swap. This function will accept two arguments, face 1 and face 2. First invoke resize to fit on both faces, then call the apply face function to actually add our resized face to both images. Finally, return the swapped faces. We'll define one more helper function before adding this face swap to our main utility. This function will take all successive pairs in a list and return those pairs. We'll see in a second why this is important. Below, where you iterate over all rectangles, call get pairs on rectangles, this will ensure that we collect all pairs of faces. We'll have to amend this definition of our variables here, because every element and get pairs rectangles is now two rectangles. Here we'll extract both faces from the image. First, define our indices using the Python built-in slice function. Repeat the same for the other face. Now, assign both faces to the swapped images. Finally, navigate to your terminal and run detect.py to see your new image with swapped faces. Now, we'll apply these functions to the live camera feed. Copy all the functions that you've defined, navigate to your main from the last step and paste all four functions, navigate again to direct.py for one more copy and paste. Copy your loop over all the rectangles. Navigate to main.py and replace the current loop over all your rectangles. Save your file, navigate to your terminal and run main.py. This concludes your face swapping desktop application. 8. Next Steps: Congratulations, you've now finished your face swapping masterpiece. You've seen a few abstractions that'll help compartmentalize your machine learning knowledge moving forward. However, there's still tons more fun to be had. How else can you leverage this application's face detection technology, perhaps face swap with a random celebrity? Share your version of this application or even just funny face swapping pictures in the projects and resources tab. Thank you for joining me in this face swapping class. If this has piqued your interest in machine learning and computer vision, make sure to check out my Skillshare profile and follow me to get updated when the next class launches. If you're interested in data science topics as well, make sure to check out my Data Science 101 and SQL 101 classes. Congratulations once more on making it to the very end of the course. Until next time.