Real Life Machine Learning Practicals | Zichen Liu | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Real Life Machine Learning Practicals

teacher avatar Zichen Liu

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

9 Lessons (1h 7m)
    • 1. Intro: Street Fighting Deep Learning

    • 2. Part 1: Intro to Colourization & Setup

    • 3. Part 1: Writing our Colourizer App

    • 4. Part 1: Colourizing Video

    • 5. Part 2: Intro to Depth Detection

    • 6. Part 2: Depth Detction Script

    • 7. Part 2: Depth of Field on Images

    • 8. Part 2: Depth of Field on Videos

    • 9. Wrap Up & Final Thoughts

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Are you fascinated by the latest machine learning demos? How would you like to recreate these yourself? In this class, we will be setting up two foundational machine learning demos with many extension possibilities.

Image Colourization

We will perform colourization inference on greyscale or colour distorted images and produce results that look colour correct. We'll then apply this to videos with spectacular results.


Depth Detection

We will perform depth detection inference on images to create a depthmap indicating how far objects in a scene are from the camera. We'll then use this depthmap to apply a depth of field to images and videos.



After this class is over, you will also have the confidence to take the latest machine learning repositories online and create your own projects or artworks from them.

The class assumes basic knowledge of programming concepts and use of a computer but that's all the prerequisite knowledge required.

Meet Your Teacher

Teacher Profile Image

Zichen Liu


Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Intro: Street Fighting Deep Learning: I am excited to share with you a very practical introduction to machine learning. In this class, we will be creating the foundation for two projects. Number one, colorization. We will take grayscale or D colored images and restore them to full cover with deep learning. We will then extend this to bring color to videos as well. And number two, depths detection will infer the depths of various objects in a photograph. And we'll use this to apply adaptive depth of field. After the image has been taken. You will receive all of the tools along the way. So they're after class is over. You can take the latest Machine Learning Repository is online and create your own projects from the will go through my entire process, looking for specific things to do to get the code, run it, and then adapting it to do what we want. If you already have a background in Python or machine learning, but that's great. But I'll be covering the majority of the process in enough detail so that anyone who is interested can follow along. This is a practical class. I will only introduce enough theory that I believe is enough to appreciate the project that we are doing. Other than that will be focused on making our end result and lot of mathematical or machine learning details. I can't wait to see what you guys create with these skills. So let's get started. 2. Part 1: Intro to Colourization & Setup: Our first project is colorization. The goal is to produce full color photos from black and white or color distorted images automatically. We often see images that I've faded or were poorly shot and wonder if we could restore them to their form and viable and colors. Until recently, this process had to be done painstakingly and by hand. But now our deep learning technologies can process multiple frames per second, allows us to colorize even entire videos. And that's what we're gonna do in this first project. To begin, let's start with some housekeeping. If you don't have Python three or a text editor in yet, please install them. Now, I will put a link to download Python in the project page. And we'll also be using PyCharm has my text editor and I'll link and that as well. Next, we'll install the necessary dependencies using Pip. I've uploaded a requirements file, requirements on SWACH hello.txt containing the libraries we'll be using. Open up a command prompt in afforded you have downloaded it to and type pip install minus our requirements on the school colored alt-text. Finally, let's clone the repository and link to their GitHub is also provided. 3. Part 1: Writing our Colourizer App: So in this video, we're going to be actually colorizing some images. So let's open up the repository that we've just cloned. And you can see that we have a colorizes folder with two models already trained for you. And besides that, we also got some sample images. Well, we'll be using our own and demo release file. That's going to be demoing the usage of these colorizes. So that's great. We'll be needing slater. Let's open up this repository and PyCharm. So it looks like the script has five parts. First, it takes in some parameters, then it loads the multiple that we'll be using for the inference. And then it takes an image and makes it ready to be inferred on. And it doesn't the actual inference had, it does. Finally, we have the manipulations of the image. So this is definitely something that we can begin with. Let's create our own script. We'll call it polarizer. So first, we'd like to load in our model. It looks like they provide to models. I'm going to guess that the 1.So later in time is going to be the more advanced one. So I'm just going to lift this part of the code. And important I'll model. It looks like we need to input the script. And there'll be from colorizes. Next, it looks like we need to pre-process our image. And it looks like it's this line here. And finally, it looks like this one. The line here performs the actual inference. So let's take this one as well. That looks good. Let's close the up. I know it looks like there is a lot of unfamiliar code here, but I will explain all of this later. Once we have some results. Let's create a new function out of this called inference. Because these three lines will be performing inference. Although image function in inference is going to take an EM image which will pass to the processing, pre-processing stage. So let's copy that here. This image will be passed into the pre-process function. And the output is going to be the output of the inference, which is this line right here. So instead of sine, assigning it to a variable and simply return it. Let's use Open CV to load in some images so that we can test it. Import CB2. And CB2 provides a function called him read. Unless you read in any type of image file, whether it's PNG or JPEG. So let's select an image and the create a new folder called test. And let's dragons, black and white images. I have a small selection of black and white and distorted images. I have alleyway and basketball, both black and white images, fully black and white. And I also have aerial shot, which is a D colorized and faded and poorly colored image. And also an old image of a policeman here. Let's see how it does. So into him. Read, we're going to pass it. Test slash alleyway. Let's begin with alleyway. And because of the CB2 loads in images as RGBA, we're going to want to restrict it to the first three channels, which is red, green, and blue. So images are stored as horizontal rows, vertical columns at them, any number of channels. So let's go ahead and use all of the rows, all the columns. And we want the first three channels. Let's call this image. And then we're going to apply the inference function will just wrote onto our image. And then we're going to write this out. So we'd have a look at our results. Let's write it to the same folder test and an alleyway. And the school app.js. And we'll didn't write our result. So the result of all inference is usually values between 01 and since as jpegs we store image values between 0255, it is a good time to scale up these values so we can see all image. So let's multiply all result by 255 and convert them into integers. Great, and let's give this a redundancy. What we get. Here we go. So this is the previous image, black and white. And here we have the colorized image. I noticed that they've marked the sky as yellow color and the walls as a blue color. And I realized that they have created the image in RGB format. And the output is expecting a bg awful rap. In other words, the blues and yellows in this image, happy reversed. So the sky is supposed to be blue instead of yellow. And I suppose these brick walls or sandstone and yellow instead of blue. So let's convert it back to BGO. We want to see v2 dot convert color that we want c v2 dot RGB to Vg. And it would be nice if we could lob this with our influence code right here. Okay, great. So I'd simply converted the result of the inference from RGB to BGI. Let's give this a go. Fantastic. That's you can see is produced an image similar to what we would expect. The sky is blue and the sense the walls have been colored a yellowish color. Let's do it on another black and white image. Okay, so that's wonderful. They realized the sky is blue color. The trees are a green color. And it's kinda otherwise some other parts of the image as well, quite well. However, you will notice that it's not quite got some colors, right? For example, the ball here and the t-shirt as well. They still quite gray and very similar to our gray image back here. So, but the flow and the other areas of the image, happy, done pretty well. Let's see how it does with an old image and faded image. This one's really good. So here is the original image, and here is the colorized image. You can see it's done exceptionally well. It's noticed all of the trees are supposed to be a green color. It's assign the colors to the photos as we would expect. And it looks rather realistic. Very pleased without one. Let's try one last one on an old image here. And here it is. This is the original image. And here it's colorized. It, it reckon the uniform is supposed to be a blue color. And you can see it's got the colors of the boots and the uniform and the skin, rather correct. And nasty green color that's characteristic of faded photos, is completely gone. Some fairly pleased with these results. We have converted grayscale images into color. Next will be doing the same for video. 4. Part 1: Colourizing Video: Next, we're going to apply our inference function too many images and stitch them into a video. Let's begin by dragging in our video. In our test folder. I'm going to drag in a pre-prepared black and white video of a forest. You can see that we have many different potential colors. Greenfield believes Brown for the log and maybe a bluish, possibly grayish color for the lake here. So it'll be really interesting to see how a categorizes this video here. Let's begin by reading into video. Video equals C2V2. Video catch-up. And they'll video file is in the test folder and is named Forrest book. So while the video is opened, we want to get the return and the current frame by reading it. And if the reed is successful, we want to process all result. Otherwise, we have reached the end of the image. We want to go ahead. And finally, we want to release the video said I've maybe read, written into memory. Okay, great. So the process section is exactly what we have been doing before, which is performed the inference on frame, either writing it out. So let's take these two lines right here. We want to perform inference on our frame. So we'll feed it into the inference function that we written previously. And as the output file, we want to put it into our folder and name it something consistent. So let's call it slash. And let's create a new folder to save all of the images that come out. Forest walk. Walk. Unless. Give a number to each friend, preferably the frame number. Okay, this looks good. Now, each frame might take a little while to process, so we want to see the progress. It would be great if we could print out the progress every ten frames, for example. So let's do exactly that. So every ten frames we want to print, currently processing frame, the frame number. Ok, this looks great. Let's give it a run. So as you can see, it's finished processing the first frame. And we can begin to look at the results as they come in. And the first frame looks great. So it's recognized the log should be brownish color. The leaves are a definite green color. And I think the colour of the stream looks about right. Let's wait for all of their friends to come in. So I've left a script running for a couple of minutes or so. And as you can see, it has generated quite a lot of frames of results is still not at the end, but I think now is a good place to stop it. We have about a 130 frames would, which would give us about six or seven seconds. So let's stop it here. Great. Now, if we inspect our images, we can see that it does play out into a video. However, we would love to get a dot MP4 file so we could play a num MediaPlayer. So let's do exactly that. We're gonna create a new Python script called picked to bid That's going to read in the images generated MP4 video. So again, we're going to use CB2. First. Let's create our encoder. So encoder equals C2V2 video writer for cc. And we're going to be using the H.264 video encoder. It requires a DLL to be present in your project directory, and I'll be including a link to it in the project section. So please do check out. You're going to want to place your download the DLL into your project directory, and this will be your encodes for the video. Next, we're going to specify where we want to output delta d z v2 dot video, right? And let's write it into our test directory. And we're going to name it. Forest walked out before. Because they include our encoder. We're gonna do 24 frames per second. And the dimensions of our video should be 1080 P. Then we're gonna do a for loop over each of our images. And we're going to release the video. So for i in range of a 132 friends up at a 132 friends are going to read an image, which is a test slash. Forest will come out slash the frame number dot JPEG. And we're going to write the image onto the video. Finally, we're going to release the video. Was all of the images have been written. And let's get this ago. Okay, great. Let's have a look. As you can see, we have an output video here. I hope your eyes impressed with the results as I am, I can't wait to see what you will do with this project. I can imagine some of you will colorize faded family photos or movies. Well, whatever you get up too, please share with us in the project section. 5. Part 2: Intro to Depth Detection: Our second project is depth detection. We will use a neural network to create a depth mask of an image. The lighter areas indicate further away and the darker areas indicate closer to the camera. Will then use this mask to apply effects toy image. The one I'll show you in this class is depth of field. You see we've been able to make the background of the photo blurry and thereby highlighting the sharp object in the front a foreground will then apply this to videos using the same technique as an polarization, to apply depth of field to an entire videos with varying foregrounds, like the stanza and the alleyway, for example. As before, let's begin with some housekeeping. I will have uploaded a requirements or the score deaths or TXT. Please install this with pip install minus our requirements on the score depths dot TXT. That's also clone the repository. Finally, the train model is provided separately to the code. So let's download that from the link I provided as well. 6. Part 2: Depth Detction Script: In this video, we're going to actually perform the depth detection as promised. So let's go ahead and open up the repository that we've just cloned. And let's create a test folder to put it in our test images. I have this image here of a lady walking and her dog. You can see a number of different depths going on here. The dog is really close to the camera, the ladies kind of midway. And then you've got some things back area like the bins. So from this we'll be able to see a wide range of depths if it performs. Alright. Next we'll open up our code in your favorite text editor. So inside Tesco pi, we're gonna look for a couple of things that might help us. The first thing I realized is it's loading the model using a default Cara's library function. So let's go ahead and take that. We'll also need the predict function. It looks like that's what it's using to predict the inputs, performing inference and getting the outputs. So we'll take this line as well. Let's, before we do that, create our own script. And we'll call it get's detect. Okay? So we said we wanted to predict and we also wanted to load model loops. So the load model is a careless default function. So you should be up here. Custom objects. Let's try looking for it. It is. This is the defined function in layers. So might as well use this as well. Alright. We'll be providing the inputs. And we needed the predict function, which is by the looks of it from YouTube. So you can see what I'm doing here is taking as much of the existing coders I need. In order to perform the predict and the after that, everything else will be code won't be writing ourselves. Finally, it looks like it's taking in the arguments from the command line. As you can see here. So the model, it looks like is the name of the model that we downloaded, the pre-trained model. So let's copy that into a folder and reference that directly. I've downloaded it to here. And I'm just going to drag it into dense depth or the top level, so that's available here. And because this fall is adjacent to our depth detect file, script will be able to just put the name in and they'll recognize it. Fantastic. So it looks like now all we need to do is provide the inputs, which should be an image, and then receive the output. So let's get the ago. To reading the image will be using CB2 again. And we'll use him read to read an image which is located at test slash JPEG. And remember, for machine learning models, it's usually preferable to normalize p of h. And that is when the image is stored, the values are between 0255. And there'll be great if we could scale them back down to 0 to one evil feeding. So let's do exactly that. And finally, before we put it in, it would be great if we could use a smaller image for the depth detection scale, a backup of the deaths detection has been performed. It's quite subtle, but you'll see what I mean after we have done it. So let's go ahead and scale. It. Will scale image to 640 by 400, which is about a third of its size. Image is an input. And the output will be depth map. Let's first of all print out the shape of what depth. But this will tell us what we need to do in order to view the depth map as an image. It's going to rot. Okay, great. So the result looks like it's provided in the image that's 320 by 240. And it's provided one of them with one single term element, pixel dot singleton element. I guess I'm can only be the depth. And one image is the output because we only put in one. So let's reformat this into, into a shape that we can use to print out all image and actually see what it looks like. So we're going to take the first element of our image first and only n-bit in fact. And we're going to take all of the rows of the 240 and we're going to take all of the columns of the 320. And there is only a singleton element, will go ahead and type them. So this should be all image. Let's try to print it out. We'll first begin by scanning alpine image by 255, reversing the normalization that we did right here. And then we'll go into re-size it back to the shape of the image that we fed it. My image here is 1920 by 280. So let's scale up to that. Right? Let's try writing it out and see what we get. And here we go. Here's the result, is that the original image. And here is the depth detect. Now you can see that it thinks the lady who's walking the dog is relatively middle in terms of distance. We have the silhouette of a dog here, thus became walked. That's slightly darker in color, I suppose. And that means it's closer to the camera. It recommends the pole is around the same death. Could check that. Yep. It's about the same distance from the cameras, the lady. And then it looks like the rest of the image is as further away. And you can see the pavement here, that gradual increase of the distance to the camera. He's quite apparent here, the depth detect. So that's a really nice result. I think. Let's clean up our code a little bit before proceeding. I would like to extract most of this to a function so that we can call predict immediately by quoting not function or, excuse me. So it's going to take an image and it's going to give out the return, the depth map. So perhaps we can take that renamed to be input. And when we select the output that we want here and scale it by 255, I think dot should also be positive in first job. So let's do that. So infor of image will perform this. And this should return us the same result here. 7. Part 2: Depth of Field on Images: Okay, great. So now that we have both the mask and the image, wouldn't it be a good idea if we could apply a blur to only the parts of the image that are further away. And thereby this will create some kind of depth of field effect. So let's do exactly that. We're gonna create a new script that's going to merge the two images. So once again, let's read in both the mask and the image with CB2. Once called test.js. One's called test.js. Ok. So test L-dopa JPEG is our mask here. And test.js JPEG image. So that the mask, even though it's a black and white grayscale type image, It's still stored as three channels. So let's just take the first channel, which will be the red channel, but it will be the same whichever one we take. And since this is going to be all super-imposed image with a mask on top of the original. Let's call it superimposed. And finally, we want a blurred version of our image so that we can add the blurred depending on what a mask is saying. So let's create a third image called blur. And we'll go into use CB2. Galleries of. The first parameter is your original image. And the second parameter is how much you want the blood to be. In the form of a convolutional kernel. The first element and the second element should be the same. And this will give you an even bla. And I find something like 707 is a good size. All right, so next we're going to say super-imposed, where the mask is greater than a sets about whether to want that to be replaced by the blurred image. So our next task is going to be determined name, what the, what the value is should be. Now this is quite difficult to tell if you don't have an application like Photoshop, but I'm going to do a guess here. So the darker areas are going to be closer to 0. And the lighter areas are going to be closer to about 200, where pure white is 255 and pure dark or black is 0. So maybe let's say, let's split it in the middle and call it 120 and just see what results we get. Great. Now let's finally print out our image and give it an index error. I see. So it looks like this was unnecessary here. Cb2 realized that it was a black and white or grayscale image and did not include the red and green or rather green and blue channels, so only included one channel. And you can see that because we've taken the zeros element, it's only a, a singular array. That's one by two a day t. So let's simply get rid of this. Rather I go. So it looks like I'll super-imposed image is exactly the same as I'll test image. So it looks like our guesses were slightly incorrect here. So let's go ahead and fix that. First. I'm going to change up the blurriness. I'm going to increase it all the way to 31 so that we can see instantly if we had an effect on all. And I'm also gonna take this down a little bit so that it can pick up more things. Because if you look here, if the number is lower, it should capture more of the white areas. So that's the theory. Let's give it a go. So this is our original and this is the, this is the superimposed. So you can see it's had some effect. Here. The background was previously quite sharp and clear. And now there's a definite nice blur. This blue only lasts around here. And it doesn't affect the foreground at all. And it doesn't affect the floor area or the dog here, but definitely affects everything behind here. You can see that in contrast to the blurriness, that lady and this pole here are quite, quite sharp and giving it quite, quite nice highlight. So let's add a little bit more blur and produce a final result here. I'm gonna take it all the way up to 71. See what we get. Okay. So as you can see, this is probably a little bit too much. The blood you can it's blurred it so much that you can even see the mask of the blur. And it's occluding part of her face and it's not quite right around here. So I think what we had before was about right. Maybe a tad more than before. So let's correct that. Okay, great. I'm pretty pleased with this. In the next video, we're going to be applying this to again, a sequence of images, which is a video. And we'll see you that. 8. Part 2: Depth of Field on Videos: Welcome back. In the previous part, we applied a depth of field filter to our image here, where the rij is turned blurry and the foreground is still very sharp and clear. We're now going to apply this to a video. I have a sample video here of a lady dancing. And I think this video is absolutely perfect because we have a range of depths here demonstrated by Rhea and a foreground that changes over time, as you can see here. The first way is the mountains and the sky and the lady in the foreground, whereas previously back here where the house as the first link and the lady in the foreground. So let's see how our algorithm does against this. The first thing I'd like to improve is our improved is infer function. So one thing we did was normalize our images going into the predict model. I now also like to normalize our images coming out of the predict model. Or first apply it to our old image here, the lady walking dog. And you'll be able to see the difference. So I'll rename our previous output to a one test del dot j peg. And I will apply the normalization 3D output. Okay, let's take a look. So this is our normalized result, and this is our previous result. As you can see, we have simply increase the contrast of the image. The blacks are more black and whites are more white as compared to before. And this is desirable for us because we'll be able to pick out the gradients much more easily in our video. And thats going to be more challenging for the video, then a single image here. Next, we're going to write our function to read in the video frame by frame and apply our algorithm frame-by-frame. So exactly the same as the polarization we are gonna do within CB2, video capture. Video file is stored the test slash Datastore MP4. And for each frame we're going to process the frame and exactly these two lines here. So INFFER is going to apply to frame, which is the frame for that particular point in the video. So rigid animal. And the dimensions of the video, I believe is Tannaitic you. And after that, we should be able to write out. Let's, let's get a frame counter in here, variable so that we can name all frames according to which frame number it is. I'm going to save all images in a new folder called Dance L. And we're going to have, I guess just the frog number. Let's go ahead and create that directory to. And since we'll have the mosque as well, let's, let's create a folder specifically for the mask. Okay, great. Let's give this a rump. So unfortunately I had to typos here. One, you'll remember I typed frame here. There should be framed number so that every ten frames will print out record of our progress. So the Change dot TH2 frame number. I'm Secondly, I forgot to format here such that this will be replaced by the frame number every time we come around to this. So let's replace that by format, right number. Ok, this looks better. And let's give that a rock. Okay, great. So it looks like it's beginning to preserve the friends. We can have a look at its progress right now. In dance out mask, you can see each of the frames are currently processing. You can see that the masks have been generated. And they look, they look alright. I suppose the foreground, which is the lady, should be a darker color than the background. And scrolling through the frames, they look okay, so I'm going to leave it to run for a couple of minutes and we'll see what we get to in a bit. So a few minutes later, it completed and it turned out there was a total of about 300 frames, 302 frames. And you can see that the all of the masks have been generated some to a varying degree of success. Some of these I can see might be problematic. For example, this one where the face has been given quite a light color. So it's not quite done it correctly, but these are rare thankfully. And when we stream it into a video, is should be less noticeable in the whole result. So fingers crossed. Let's see how it looks like in a video. First, we'll need to super impose these masks on to the actual video. So let's write a script to do that. So we're gonna be modifying or depth of field to, instead of applying a mask onto an image, we're going to apply the mask on to a video. So the framework for reading in a video is fairly, fairly Tobin, fairly the same. So I'm gonna copy this and paste it here. So as before, because we just wrote this, we're going to read in the dance dot mp4. And then for each frame we're going to do something in these two lines here. Well, instead of performing infer, we're gonna be doing exactly what we did here. 9. Wrap Up & Final Thoughts: I hope you have enjoyed learning with us, and I hope you do something creative to extend or what I have taught. There are so many ways that you could apply these skills to real-world applications or artistic creations. And I look forward to being amazed by students projects. Have a great day and I'll see you in the next class.