Generative Adversarial Networks A-Z: State of the art (2019) | Denis Volkhonskiy | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Generative Adversarial Networks A-Z: State of the art (2019)

teacher avatar Denis Volkhonskiy, AI Researcher

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

22 Lessons (2h 24m)
    • 1. Introduction

      5:25
    • 2. Generative Learning Motivation

      8:19
    • 3. Generative Adversarial Networks

      4:49
    • 4. GANs algorithm

      5:13
    • 5. Deep Convolutional Generative Adversarial Networks

      4:36
    • 6. Measures or Quality

      8:52
    • 7. Practice 1d

      15:55
    • 8. Practice 2d mode collapse

      5:20
    • 9. Practice celeba

      8:37
    • 10. Practice celeba 2

      4:42
    • 11. Celeba 3

      1:55
    • 12. Applications

      4:45
    • 13. Cycle GANs

      7:00
    • 14. Superresolution

      7:13
    • 15. Inpainting

      4:41
    • 16. Text2image

      9:55
    • 17. Progressive Growing of GANs

      4:32
    • 18. Style-based Generator

      8:03
    • 19. Game of thrones

      11:05
    • 20. BIG GANs

      1:31
    • 21. BIG GANs model

      3:39
    • 22. BIG GANs techniques

      7:59
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

116

Students

1

Project

About This Class

How to generate high-quality images from noise? Is it really possible?

Generative Adversarial Networks were invented in 2014 and since that time it is a breakthrough in the Deep Learning for generation of new objects. Now, in 2019, there exists around a thousand of different types of Generative Adversarial Networks. And it seems impossible to study them all.

I work with GANs for several years, since 2015. And now I can share with you all my experience, going from the classical algorithm to the advanced techniques and state of the art models. I also added a section with different application of GANs: super-resolution, text to image translation, image to image translation and others.

This course has rather strong prerequisites:

  • Deep Learning and Machine Learning

  • Matrix Calculus

  • Probability Theory and Statistics

Here are tips for taking most from the course:

  1. If you don't understand something, ask questions. In case of common questions I will make a new video for everybody.

  2. Use handwritten notes. Not bookmarks and keyboard typing! Handwritten notes!

  3. Don't try to remember all, try to analyse the material.

Meet Your Teacher

Teacher Profile Image

Denis Volkhonskiy

AI Researcher

Teacher

Related Skills

Technology Data Science

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hello, my dear friend. Welcome to the course. More than generative adversarial networks. My name is Dan Isfahan Ski. In these course, we will mostly talk about generative feather style network, the framework that was proposed in 2014 and now has state of the art results in generative learning. Now you can see on the slight the progress in phase generation that was made scenes 2014. The first image, which is the great person, is a result of using standard generative adversarial networks. However, from year to year, people improved the quality off the generated images and on the right of the slight, you can see the result off state of the art generative adversarial networks which is called progressively growing against and its modification and all these we will study in our course. You can see the other amazing application off. Ganz is their interpolation between different faces, and we again will study along these in our course. After taking this course, you will understand how to build dream defender cell networks that are able to generate such a high quality images. That is she on the picture. All of these images are synthetic and there is no one really object. He you can see the progress that was made in generative learning for Image Net data Sat Image net is, the data said, that contains 16 millions off images from 1000 classes, and you can see what people can do now, like generating very realistic dogs. See Burger and any other classes. The model that generates such a high quality images is called Big Gan's big narrative for the several networks, and we will study these model in our course in our course. We also will study a lot off applications of generative adversarial networks. He You can see the transfer from the sketch toe the generated image. There is some tools where you can draw a sketch and obtain and generated image from this sketch. Another application off journalist if adversarial networks that we will study is image to image translation. When we have image off one type, for example, somewhere and with generous if adversarial, networks will translate this summer image summer photo to the winter image and back from winter to summer. Also, we can translate the photos to the drawings and the drawings to Fort US horse to zebras and in fact, many, many other applications. This is not allowed applications that we will study. We will cover a lot more like super resolution in painting and many others. This course will be rather heart. And I expect you to have knowledge in building neural networks with any library like Pytorch, Tensorflow and Fianna or something else. I hope that you understand basics off linear algebra, probability theory and statistics, and you are not afraid off such worse as probability density, Cuba, Kleber, divergence and any other. And I expect you to have a great wish to study more than generative adversarial networks because my focus is to share with you my knowledge in state of the Arjun narrative adversarial networks, how you can train them, how we can build them and how you can do it efficiently in order to obtain your results and maybe to self your practical problem. You know, of course, we will have four main blocks first block his introduction that they are now listening. I will describe what is generative learning and why do we need it and what are the existing approaches for generative learning In the second model? Will studies standard them convolution Algeria, def. Adversarial networks. So after the second ma do, you will be able to generate realistic photos off people off size 64 to 64. After the third more do you will know a lot of applications? A generative, adversarial networks like image translation, image in painting, image, super resolution and many others. And in the last section, in the fourth, we will study State of the Art Unity Feather several networks, which is called progressively growing. Ganz and Big Ganz. I will share with you all my experience in these state of far Ganz. And after that you will be able to build your own against, to solve your own problems and to generate whatever you want. Thank you. And see in the next video. 2. Generative Learning Motivation: in this video, we'll talk about Jeanette. If learning and density estimation, we will consider different applications off generative flown in on bond. Different types off density estimation. Machine learning can be divided into two big parts. This came into flirting engineered. If learning discriminative learning is such problems as classifications, regression and many others on, usually in discriminative learning. Our goal is to estimate the distribution p off. Why, given acts, it means that the given object acts. Our goal is to predict some label for it. It may be class label or regression label engineered if learning. Usually we have a set of samples X, and our goal is to estimate the distribution be off acts. Another possible very end off generally flown in is conditional distribution estimation when we have a set off objects, acts and labels why our goal is to estimate the distribution view of acts, even why and sometimes injury. Atif learning. We won't just to estimate the distribution. Sometimes we want Teoh have the ability to sample from the distribution. And while beauty engineer, defender style networks, we want to build a network that will allow us to sample from the distribution bill for X or pure facts, new and white. Now, let's consider why do we need engineered if learning there are many possible applications. The first application is generation off realistic samples off realistic images, which can be used in some articles or books or papers. The second application is complex probability distribution estimation so usually will have a set of data, and we want to somehow estimate the distribution off these samples. And the samples can be you from off any type or dio images, text and for some reasons for some, maybe other algorithms. People sometimes want to estimate these distribution here. Generative learning helps. The third application or generative learning is simulation off the future, for example, for reinforcement learning so it would have a bulls off a person in the current moment. What will be his pose in the next moment of time? And that's where we can use generative learning. Another possible application is feeling missing will use in our data. As you know, in practice. Sometimes we don't know some features, and what people can do is to insert just average, will use or even delete these roles from the training set. But the good way will be to predict the mission values and to insert them. And here again we can apply generative learning. The next application of generative learning is when we have a lot off data, but only a few amount off label data, as you can see on the bottom image. Now we have two dots labeled and others are unlabeled, and here we can also apply generative learning in order to retrain their classification or aggression network. Another possible application is text images. Entities. Imagine that you write a book on Did you would like to insert some illustrations to this book and imagine that you built an algorithm that take their text as an input and return some generated images which you can insert into your book. And that's the application of who will study in our application section. The next application is image in painting. Imagine that you have an image and you would like to delete the person from this image what you should insert instead of the person. Could we do it automatically and the answer is yes, and I will tell you how we can do it. Next application is super resolution. Imagine that have an image that is very small, and you would like Teoh increase its size. And if you just do it in in the program like foot shope that use nearest neighbors method or some other interpolation method, you will obtain very low quality image. But using generative learning and generative adversarial networks, you're able to increase the size off the image and have a high quality as an output. In fact, this is not all the applications, and we will consider much more. I just tell you that we're not limited in just generation you cats and dogs. There are a lot off really useful practical applications. All generative learning is based on density estimation. It means that usually will have a set of training samples. It can be images, sound, taxed or any others, and we would like to either build density function or having a beauty to sample from this density. And we can divide estimation off density to two types implicit and explicit. While we use implicit mort, their likelihood usually is not available at all, and density function is also not available, but we can sample from the distribution that we estimate, and they're good example off these approach is generative adversarial networks that we study in this course using generative fed recital networks. You can't tell what will be the likelihood, and you can't estimate the density explicitly. But you can sample from the distribution and obtain, for example, images that are similar to the images in your training set and that the DYP is explicit junior different models. Usually they are Butte with likelihood maximization, and usually we have these likely good function, and we have density function off these distribution that we estimate, but we not always can sample from the distribution. The good example is kernel density estimation when we estimate the density of the distribution using some colonel like ocean, but we are not able to sample from it. And now the most popular generative learning are built in the foreign way. We have some generative model, which is usually a neural network with some convolution, als or linear layers. We have some noise as an input which can be assembled from random uniform or normal distribution, and as an output, our network gives us some faces. That is all for this video. Thank you and see you in the next one 3. Generative Adversarial Networks: In this video, we will study what is generative adversarial networks. Imagine the situation. There is a contour freighter and a police contour freighter tries to make fake money in order to deceive the police. On the other hand, the policeman tries to distinguish between real money and fake money. And in fact, this is the idea behind generative adversarial networks. We have two neural networks, generator and the discriminator generator tries to generate a realistic object, realistic image. Discriminator tries to distinguish between these generated object and draining objects, real objects. So discriminator is a binary classifier which has two classes, real and fake, and tries to learn the difference between fake class and real class. On the other hand, generator tries to deceive it. As an input. Generator takes some random noise from some distribution like normal distribution or a uniform distribution or any other. It is a learnable function. Usually it is a neural network and the discriminator is also in neural network. Both generator and discriminator has its own parameters which are learnable and learned. We've stochastic gradient descent or Aedes modifications. The input of the generator could be the noise from any distribution and of any size. So it can be a vector of noise and the output is synthetic sample. The input for the discriminator is an object. In our case, it is image. And the output, the probability of this image to be real or to be fake. So discriminator output, one probability as a binary classifier, probability of image to be real. Let's look at the last function of the generative adversarial networks. Let's assume that d of X is the probability of x to be real. G of z is generated sample, and that is some random noise. Now we consider the loss function for the discriminator. The first term stands for the probability of a real x from the data distribution P data to be real. So, and we want to maximize this probability. This is why in the last function, we have this term with minus sign, we minimize minus this probability. Second term stands for fake objects. Now, we want to minimize the probability that our synthetic object, which is g of z, is real, we would like to minimize this. That is why we have there one minus di. To summarize, when we optimize the discriminator loss function, we maximize the probability of a real object to be real and fake object to be fake. This is just the standard loss function for the binary classifier. If we consider the loss function for the generator, we again look at the feedback of the discriminator only on synthetic objects. So what we do here, we generate some objects from set G of z and then give it to the discriminator. And now we maximize the probability of synthetic sample to be real. It means that if we maximize this probability, then our discriminator will fail in prediction and will not distinguish between real class and fake class. This is all for this video. In the next video, we will consider algorithm of training generative adversarial networks. Thank you and see you in the next video. 4. GANs algorithm: in this year, we will study the algorithm off train engineer the feather several networks. The train off Ganz is divided into two parts. Updating over discriminator and updating off the generator. For each iteration off our training, we keep times update that disk a meter and one time update the generator. This rule was proposed in the original Gansz paper. However, now things changed and people usually do in another week, which I will show you in the next videos. However, now, let us study their original algorithm. So, as I said for each iteration off training, we update the discriminator key times in each Off this update, we firstly sample am no assembles from some private distribution which can be uniform distribution, normal distribution or any other. After that, we take some M random samples from the training set. So if you have a set of training images, we take random, um, samples. And look, Now we have two classes. First class is fake class. In order to obtain it, we use Zet one to that M and use them as an input to the generator and the generator out boots, a set off objects which we ask you to be fake and the second class is glass off. Really object which we just took from the training set. Since their discriminator is a standard buying your question fire, we use binary cross and to be loss mentioned that now here is the word Escandon In the algorithm, it means that in order to use gradient descent, we should put minus here. Anyway, let's look what is happening here. This is really object and D off X is the probability off X to be really so we we maximize the probability off really X to be riel. And here we minimize because of this, minus the probability off fake object to be real. So in other words, we maximize the probability off freak object to be fake. This is, I think, rather obvious. And he's a standard practice for training. Binary question fire. We now can average these loss function and compute its Grady int. This formula stands for that discriminator abd agent and we do key such updates where K is a hyper parameter which of course can be selected. Then we switched to the updating off the generator. We again sample a budge off noise samples from our same prior distribution which remind you uniform were normal or any other. Now we do the opposite to what we did in the discriminator update. We maximize the probability off freak image to be riel. It means that we train our model in such a way that we deceive the discriminator. But now we don't touch their parameters off the discriminator at all. We just use its feedback and in thes line on this step, we just update that generator Wait. In order to summarize the training off unity, feather settle networks, we key times update the discriminator one time update the generator. And when we updated discriminator, we use Stein standard buying recourse and two pillows and updated as a standard buying the re classifier. When we update there generator, we again use the discriminate er's feedback but touch and update only generators Wait. But we choose the opposite direction on do it in such a way that the discriminator will fail in his production. They're just all for the standard algorithm off during ingenuity feather several networks. Thank you and see you in the next video 5. Deep Convolutional Generative Adversarial Networks: Now let us talk about the Braber called Deep Convolution. Aled Generative Adversarial networks. This was the next valuable work after Goodfellows work and was designed specifically for image generation. The main idea off the paper is the convolutions in both discriminator and generator allows to generate more realistic images. Here on the slide, you could see the structure off the generator which consists off transport convolution layers, which we will consider in the next video. He you can see their examples off generated bedrooms off size 64 to 64. Some of them are quite realistic. However, it is easy to see whether the images real or fake. Here are the examples Off generated faces gain off size 64 to 64 for their 2015. This result was amazing. The will you off the this against paper is in the set of recommendations for training. These recommendations are you shouldn't use Max pulling increase Tried off the convolution instead. Then sports convolutions are good for assembling in the generator. It is better to avoid folly. Connected players by civilization helps a lot use real ooh in the generator and licking Lou in the discriminator. All of these recommendations were obtained empirically, authors just dried and obtained results. However, it allowed to significantly boost their generative learning forward. Authors demonstrated an interested in the relation experiment they took to vector from the lace and distribution for both these two vectors. They generated an average just making forward pass through the generator. After that, they created an interpellation in the late in space from the trust vectors. With second on this interpolation line, they took several vectors and gain generated images from them, and the effect was the following. While we manually make the interpellation in the late in space, the interpolation also occurs in the images space. And it means that the generator made our later distribution meaningful for him, despite it is random. In fact, this is a good test that shows that the generator doesn't remember images from Train said, but generates new sample. Here is the same interpellation experiment for the bedrooms images. You can browse the video and look at the pictures in details. Okay, what if we want to perform some arithmetic operations on our images, for example, what you should obtain if we subtract a man from a man with glasses and at a woman seems that we should obtain a woman with glasses. But if we do this in images, space will obtain some noisy rubbish. The good news is that we can perform meaning for arithmetic operations in late in space. So what we do here we take one late and court that transforms by the generator into Amanda's dresses, one latent court that transforms to a man without dresses and one late in court that transforms to woman. After that, in latest place, we perform the following we substrate act from the vector for the man with glasses, the vector for the man without glasses. Then we're at a vector for the woman without glasses that's were obtained in New Vector in delays in space. And if you make a forward pass with this factor, we will obtain an image with a woman with grasses. In other words, our Earth Medic operations in the Latin space becomes beautiful. We can't perform the same earth medical operations in the late in space with images off smiling woman, mutual woman and Nudelman, which will give us a smile, man. So that was all for deep convolution. Aled generative feather cell networks. Thank you for watching you in the next year 6. Measures or Quality: Now we're going to talk about measures of quality for generative adversarial networks. Could you please tell me if these images are really fake? Some of them are quite realistic. Earned the and, Yes, you're right there. Aled generated. But how about these images? Do you think any of these images riel or a lot of them are fake? And if you browse the video, you could mention that despite the very good quality off these faces, some hair are like plastic and yesil. These images are fake. I'm not going to ask you about these images there. ALS synthetic looks quite nice, right? So how you define if the image is good or not, how do you know if the image is close to the reassembles or not? Well, you have a neural network in your head, but how we could measure the quality off generated images Is it possible? And that's where is yes, and we will study how to do it. So why do we need the measure of quality for Ganz? Firstly, we should have some way to compare different models with different hyper parameters such as number off layers, layer types and number of parameters. Secondly, when we train Ganz, we should have an indicator of quality in order to stop training in the right time. Last but not the least. If will write a research paper, we should somehow compare our model with existing. It is very bad practice to say our generated cat is better than others without providing any numerical comparison. We want our measure of quality to correlate with two aspect. First, what is the quality off our synthetic images? How good are they? And, second, how diverse is our generated SAT doesn't cover all image classes or generates only one image each time. The second magics requirement is very important. When training Ganz, there exists a problem of more collapse. Look at the top, which on the slide our target distribution is a mixture of go shins, which you can see on the right. It has several Mort's, but at different books off training. Our against model will generate samples only from one more, but not from all of them, as we expected. This is what people call more collapse. Considering images, you betraying our model on monies data set. The more collapse will be. If the model starts generating only one digit. This is why it is important that our measure of quality could manage these. As I said, if you cite to go, our brain has a neural network that decides whether you much Israel a fake. The methods that people use for distinguishing real and fake images also use their own network. It is called their Inception Network, also known as Google Net. It was trained on the image now data set, which consist off around 40 millions off images with 1000 classes such as dogs, cats, mushrooms, airplanes and so on. Let me briefly remind you what is Antropov The distribution. The entropy is a measure of randomness off the distribution. It means that the more the distribution uniform, the high will be the entropy and vice versa. If the distribution is concentrated in one point, it will have the smallest NTP zero. As you can see on the slide, the entropy off be to distribution will be higher. Since it is uniform, the distribution P one is more concentrated. The first murder of quality is inception. Score. Firstly, we used Inception Network for predicting the distribution. Be off. Why? Given X thesis, an output off the last off marks layer off the inception network. We want these distribution to be highly predictable. In other words, we wanted to be concentrated in one point. It means that we won the entropy off these distribution to be lower, and this is stands for the quality of an image. The more concentrated is our image. The more confident is the network in the prediction, which is good in terms of the image quality. Okay, we understood that we want the interview of the distribution p or Y given X to be lower, and it stands for image quality. Now we have to do something with image diversity. For this reason, we can consider appeal. Why distribution? We wanted to be uniforms, which means that we want the entropy off your flight to be higher. Now, how to calculate the off? Why we should generate a lot of images for each off them. Make a forward pass through the inception Network and predictably, Or why given X, then we need to ever age our prediction distribution by X, by our images excess our images. Thus, we will obtain the distribution for P off y. Look at the picture on the slight. Here is the demonstration off our wishes. We want Bill, Why? To be uniform and beautify given acts to be concentrated in one point from their probability theory, we could remember that Cuba Kleber, divergence between two distributions is the difference between the cross entropy and the entropy. Taken into account the fact that we want their minimization or the first entropy and the maximization over the second entropy, we can combine them into Air Cuba. Kleber Divergence and he wept. Inception, score. It is defined if there que el divergence between two distributions people fly, given X and b affects. And now you know how it was obtained Notes that the higher the inception score, the better is ever against model. Now what has discuss the second measure of quality for Ganz for share inception distance as you can guess it also use inception network, but in different way. Firstly, we use inception network for feature extraction for both synthetic and real samples. Then we assume that the obtained features are distributed as a motive area goes and distribution. Then we esteem. I they're mean and co variance matrix. We do these both for real and synthetic images. Then we compute F I d with the equation on the slide as the difference off means and the trade off the square distance off currents, mattresses, the lower fresh, a inception, decent the better is synthetic image quality. For Shea inception, distance was shown to be sensitive to more collapse, which is good for us when our generator sample images only from subset of classes there. F i d starts increasing. Let us now compare the inception score with for share inception distance. Both of them could relate with human assessment, however. Inception score doesn't day Kriel samples into account. It'll if I were generator samples. Only one image per class than the inception score will fail and show high score if I d is sensitive to it. The disadvantage off the F I. D. Is that if we take two different batches from the train set than if I D will be non zero, which is not good. And both these two magics have a common disadvantage. They can be applied only toe for toe like images like cats, dogs, airplanes or anything. If you generate specific images like emery or textures or any medical data then these matters are inapplicable. The last measure of quality that I would like to discuss his Germanic score. It is based on their manifold hypothesis and applicable to any kind of data. If you want to use it, use the link toe they get have from the slide. It is all for measures of quality. Thank you for watching and see you in the next video. 7. Practice 1d: In this video, we will study how we can build a genetic fired recital network that will approximate normal distribution as a noise for generator. We will use uniform distribution. So first, let's import necessary libraries. Then we have two functions. Get uniform, which is in fact our noise and get no more distribution, which is a normal distribution with mean five and standard deviation. One. I know that the size off these distributions is which size to one. It means that is just a vector. Now here we have two classes, generator and discriminator, and both off them are pytorch compatible, and the now we have to write a court for the main sequential model. So here I proposed to use to lean your layers. Here we have in features one and the intermediate features. Let's say it will be 16 is in. Let's add some no need charity. Then again, clean earlier. Onda uh, in features 16. Because it's an output from the previous lier. Andi out features one so, and we expect that after training, these generator will transform the uniform noise toe the normal samples to samples from the normal distribution. Now we have to write their discriminator. So here we again was here. We again can use to win your Lears Viniar in features one out features. Let's be the same 16 then only new authority then again and ended Senior in features 16 and how with features one. And he would have to return the probability off X to be really class. Very why we can and we should at sigmoid function here so she might will transform our lodge it That is an output from the lean earlier to the probability And here we greet too object for generator and discriminator. After that, we should define optimizers that will optimize generator and discriminator and usually people do it separately. So one optimizer for generator and 14 discriminator. Now here we have to write do functions first for updating the discriminator and second for updating the generator. Now what do we have to do here? We sample riel data compute capability off rheal object to be real than simple noise. Obtain generated object Compute probability off fake objects to be fake and right loss function. So let's do it. In fact, what we do here, we want to build binary cross and to be lost. Using the annotations from the lectures, rial samples will be just get normal function beach size. This is how we're real samples now the probability off aerial object to be real. In fact, it is. Just describe the nature from from riel samples. And that's it. Uh, this is because I defined that discriminator off X is the probability off X to be riel cross So And this is what who want. Now we have to do something with generated samples. We have to obtain noise. And for these propose we use get uniform bitch size here, then generated samples is just generator from noise. And in order to obtain the probability off these fake samples directed temples to be fake, we need to write one minus discriminator from generated samples. Why one minus? Because discriminator from the input returns a probability off this input to be really crass. So the probability off input to be fake class is one minus the output of the discriminator . Now we have to write loss function here. Let's think when between their discriminator what we want, we want to maximize the probability off Really glass to be riel, right? And we want to maximize the probability off fake class to be fake. That is why we should do the following When we use pytorch optimizers we usually minimize. But we would like toa maximize. That's why we just put minus sign before probability off. Real class is real and minus sign off. Probability off fake class to be fake here we should add local reason. Torch wog and here again torched a look in order to obtain proper loss function. And so here we are. We maximize the's. That's why we minimize minus these algorithm of probability and the same was this probability. And now what we have to do in Pytorch is to write dot mean because this is a vector and we should obtain a scale a radio for our loss function. Good. We're done with the discriminator. Let's now switch to the generator. We again should obtain noise. Get uniform. We should obtain generated samples. It's just a game generator from noise. And now we have to obtain probability off free class to be real. This means that we should ride discriminator from generated samples and that So this is because discriminator, as I said, returns the probability or from put object to be real. So we give it fake object like here and there. Output is probability to be real. So we're done. And now the last function What should be We should maximize the probability or freight class to be real generator wants to maximize the probability off a class to be real. It will mean that there discriminator will fail in his predictions. And we know a generator can food. That's why we're right. Minus again. George, look broke figure Israel and we have to write gain dot mean here and that's all. And here we have zero gray ingredient than backward pass and then do optimization step And in fact, that's all. No, we have a loop for the train. What we have here we have the number off updates off for their discriminator on each generation. Uh, here it is used so on a situation where Date 10 times the discriminator and one time the generator. And we should tune this parameter and have based size 64. Now let's run this group. Aurore get uniform should obtain bench size. Yes, bitch Saiz and as Anos sns is not defined because I forgot to Importancia born. Yeah, which would be right now what we can see here. Ah, orange is our true distribution. And blue is our approximation off wheat, the output of the generator and our goal is to make them closer to each other. Now the green line is the probability that is the output off their discriminator. So you can see that the distributions after training became approximately the same. Andi, let me stop it. And there be from real data starts to be near 0.5. It means that if the discriminator output the probability 0.5, it means that he can distinguish two classes real and fake. And in fact, this is what we wanted. We obtained samples that are generated that are very close to real distribution, and discriminator fails to distinguish them for these will use. So, in fact, this is not the ideal approximation. However, as I said, we can tune this parameter on Duh. Let's say can be 50. And, uh, let me add the court for limits for why? So we added here and let's restart the colonel and Ron all cells again. So I said, 50 approve here, and let's change to 300 and I added limits one. Why? So we re started and run everything else. Now you can see this is the distribution that we want to approximate. And this is the distribution that is the output of the generator and step by step, it becomes closer to the release. Trib Yushin. - Well , I think that in this case, because we can't approach you might properly we should increase the number of parameters in the discriminator. Let's do it. So let's said not 16 but se 64 here and again we start kennel and run all cells. Yeah, let's wait for meat. - You see that? Here at this point, it is random. And here it is not random. It means that didn't discriminator learns the difference. And because off this is not random prediction Now it will change the behavior off the generator. So you see what is happening? Generate two tries to change the situation somewhere here and here. So you see, they are now rather close to each other. Okay, I think we can finish with one dimensional case. In the next practical video, we will study how we can implement two dimensional case for approximation off mixture, off oceans and this problem with much harder than these one. Thank you and see you in the next video. 8. Practice 2d mode collapse: in this video will modify previous practice known book Do their two D Keys. Now our goal is to sample from the mixture off two D go oceans distributions. So now let's again import all our data. Now here we have three moves and three Sigma's, and all of them are used for the distribution generation or the generation off samples from the mixture of Ghoshal distributions. And we have here that dimension off the noise. What set it to one which it was in the previous video. There they mention off noises here and as they said it, it is one Now then we have get normal function, which is unnecessary. And I added two functions first export to D density in order to with their lies it onda sample real data. Now we hear with simple from three normal distribution and then stack these samples. So let's look at it here. You can see the distribution that is made by assembles from normal want normal to and normal three. Okay, now we have a generator and it is community from previous videos. Now let's modify it. The first thing we need to modify is that linear layer now should have an input noise dimension here. Because now we assume that we can increase these dimension. Let's leave everything else the same. 16 features here is uh, no, uh, here is an output we should set too. Because now we work with two dimensional data and ah discriminator as an input also obtain two dimensional vectors and transform into one dimensional to the vector of probabilities. Okay, this is the same. The function for dating D and D's. The same. Then let's look what is happening. So get normal is now really temples is simple real data I should change. Get normal function here. And so yes, And I hope it should work. Yes, it works. And ah, he you have really data. And he you have generated data. So it has the same scale as you see from minus 15 to 15 for why? And X. And now you see how these generated distribution tries to approximate the distribution. I think it is rather heart for the against. No, uh, let us increase the noise dimension. We should just try due, for example, to and Ron everything else in game. Okay, It again starts from some points here now, what can you see here and what I wanted to demonstrate you. As you can see you, he would have three morts off our distribution because this is a mixture off three goals and distributions. But here you have symbols only from one more of the distribution. So that is located in minus 55 which is this one and not from any other. This is what people call more collapse. And, uh, I show you these practical keys because it is really important topic. And usually it can happen not only with mixture of emotions, but even if you generate faces, for example. And in some of the next videos, we will study how we can avoid won't collapse and what we can do with it. Why is it so? And so one? So this is all for this video. Thank you. And see you in the next one. 9. Practice celeba: Hello, my dear friend. In this video, we will study how we can implement deep convolution, algae narrative adversarial networks. In practice, I took all the court from the official pytorch tutorial. And in this video, I want Teoh commanded. Teoh, explain each line of court to you. And I hope that after this video you you will be able to build DC Ganz by yourself. So that task for these tutorial is to generate faces. And now let's import necessary libraries and he would have some settings What we have here we have data route. We have number of workers for pytorch data set. It's the number off processes that load our data set from data who have batch size have image size will work with CELEBRA data set which has images of size 64 to 64 and we will train our network Teoh generate images of size 64 to 64. Next, the number of channels were have three channels red, green and blue then and that is the size off the late in space off the latent vector. So we will generate the noise off size 100. Here is the size off feature maps in generator and discriminator. You will see a bit later. What it is the number off a book of training, learning create beat a want parameter which is used in Adam Optimizer and the number off deep use. So now what we do here we load our data from some folder. Our date is located in data route, which is current directory. Then we applied transformation which is precise than center crop, then to Tenzer and normalization. When we do recites with just rece isil images to the size 64 to 64. Then when we do a center crop, we crop the central part of the image. Then we convert these image to Tenzer two by Dutch dancer. And after that we normalize here we Beth mean and standard deviation. And it means that after these normalization, our mean will be zero and our standard deviation will be one. After that, we should breath these data set to the data loader with bench size. We want to shuffle our data and I always recommend you to shuffle the data on the number of workers that we define and for yourself. After that we defined the device which in our case will be Cuda. And here is some visualization off our data. So you can see what training images who obtained this is faces off celebrities, for example. I can see here Harry Porter and other different celebrities. No, he will have weight in Sal ization function. It is a bit different for convolution and for bichner For convolution, we use normal distribution with zero mean and close to zero standard deviation. And this is rather practical recommendation that was shown to be rather good internalization method. And for batch norm, we initialize the way data is it is standard deviation with one and bias with zero. I remind you that in vaginal we have two parameters. Learn herbal, mean and learn herbal standard deviation. So he was Sedler nable mean to zero initially and learn herbal standard division 212 random number near one were key Onda. He will have a generator, so he would have, um, set off transposed convolution, a wears based organizations and real ooh layers. Let's look at first transposed convolution as the input number of channels were have on that which is in fact, our late in space. So the input here will be our noise and the outward number of channels sees n g f 0.8 ngf. I promised you it is the size of feature maps and generators. So it is 64 before the discriminate to again have 64. So with just a parameter that we defined. So he you just set 64 64 64 so one. Then there size off. The colonel is for the stride is one and the bedding is zero off that we make batch normalization in two dimensional space because we work with images again. Re Luke Evolution Boesch normally Lou and so one and ah noticed that as an output we have hyperbolic tangent here. It means that all our images will be from minus 1 to 1. And this is important fact because when you usually work with images, all will yous off pixels should be in the range from 0 to 255 which stands for pixel intensity or color in density or from 0 to 1 which can be transformed Teoh the range from 0 to 155. However, he will have from minus 1 to 1. And this is just practical keys and, uh, in the paper deep conventional Jared Further cell networks author showed that hyperbolic tangent as an output layer is the best for against he will initialize the network, make it parallel and apply waiting in salinization and you can see what is our neural net Looks like the same would do with the discriminator, however noticed that now we have as an input number of generals, then NDF NDF to do and you have to four and so one. And again we have convolution now like a real ooh instead of just really It's just again practical case that authors of the paper studied and again and again and also with beige immunization we get initialized discriminator And now we define the last function, which is binary cross and a pillows. In previous videos in practice, I showed you how we can implement this buying recruits Entropy by our hand for Ganz are using probability off the image off the object of real or to be freak. And now we can just define binary cross and fabulous from pytorch. Then we have some fixed noise that who will use for testing on integration. We just sample it wants from normal standard normal distribution and we fix it. Then we establish some class for real object, which is one and for fake object, which is zero. And we also have here to Optimizers for the discriminator and four generator. And here is our parameter beater one which is also which is 0.5 and which is again the recommendation from the dis against paper. I think this is all for this video. In next video, we will discuss training off this network and we'll look at the results. Thank you. 10. Practice celeba 2: Now we have the train loop for our dese again. Let's run these cell and And while it's running, I'll explain you. What is happening here? So have three least for losses and for images we have, it's a reader forage orations. And here we have number off books one a pokes mean that we will see each off the images once. So this is wonderful. And, uh, totally we have five books. Then for each IPO, we do the following. We have to mind books about dating off the discriminator and updating off the generator. When we update the discriminator, we first do update with riel samples. So what we do here? We obtain some data from data older. Then we just form it to device, which now a case is GPU. Then we set the label to relabel, which is one. Then we obtained probabilities off these riel samples to be riel and call their criterion, which is now a keys. Remind you binary cross entropy. Then we do backward plus and compute the last fire, the less William. In fact, we do the same as we did in one dimensional or two dimensional cases. The difference that now we use bring defying criterion But it is in fact the same. Now we're obtained their discriminator with fake bench. For these, we take some noise from normal distribution with and set samples 1 to 1 and one and one. This is special dimensions of this noise and this is like a number of channels. In fact, this is not an image, but you can look at it as an image. So we have dimensions batch size, then noise dimension which can be interpreted as a number of channels, then hate and weeks. So these just I would like input, noise, image and, uh then we call generator function said the label to freak. Then we obtained the output of the discriminator on these fake. In other words, do you rated samples and also compute the criterion? Do the backward bus compute mean loss, will you? Then we sum up rial error and faker and do the step off the optimization. So again we do the same as we did in one dimensional or two dimensional cases. Now we have to update the generators network. We should maximize the probability off fake sample G off that to be really This is the case for generator. What we have here, we again have the same fake images. We compute the discriminator. But we said the label to really label. So he was set it to fake label. And here it is, riel and compute the same criterion. Do backward pass, compute loss function and do with optimization Step. I know that the discriminator is updated using riel and fake samples, but generator is updated according to the discriminator lost on Leone riel samples and with mixed label. So not fake here but really label. He will have their brains off. The loss will use. And ah, here we just with their allies some images and abundant to some pretty fine theory. Now let's wait until the training will finish. 11. Celeba 3: our training was finished. And now let's look at the results. So he you can see the loss functions for the generator and for the discriminator. Now we have there with generalisations off images, starting from the first filtration and until the last iteration. And here we can do the following. And we obtained the animation. Ah, here we use fixed noise in order to obtain this images. So all this images were generated from the same noise and you see how the quality of them improves from the alteration to integration. So, yes, years, the final on again At the beginning of the trading, the generator outwards, something like noise. And now we can compare generated and riel samples. So you see that it is rather easy to distinguish between real and fake samples. However, they're quite nice. We can skill. You see some artificial stuff here, but I think it's rather nice result in next videos. Who will study? How can we generate really good samples? Really high quality images. This is so for phrase generation practice. Thank you. 12. Applications: in this section will talk about applications off generative Federer Cereal networks On the slight, you can see total number off papers for engineer. Different for several networks with WAAS published scenes 2014 and in fact, the number This image was taken from one off. You'd have accounts that Horst allow these articles, but I think there are much more articles, maybe around 1000. And, of course, it's usually hard. Teoh analyze what is useful work. What is not useful because everybody wants to develop their own model. And in this section I tried to select the most valuable works, the most valuable applications that can be applied in either business tasks or just interesting and useful for further Ganis development. Now the first modification or application off Genz these conditional generally feather cereal networks. So I already told you that the gene reader, which is blue in the slide, usually takes some noise from uniform or normal distribution as an input. The difference off their conditional Ganz from Standard Ganz in the is that we also put some class label as an input to the generator and to the discriminator, which is green. This is usually helped when we want to generate an image off a given class. For example, if we would like to generate images off Henry and Digits, then we can use digits from zero toe nine and stack them to the noise vector, as I showed on the slight. And this allows us to generate conditionally toe. Generate an image given the class label. Now I want to show you really useful application off Genz for pro training. Again, we have a generator, which is blue, and the discriminator, which is green and imagine you have a set of images which are alone unlabeled so you don't have classes for them. But your final task is to solve some quantification or aggression problem, so you need labels for your docket program. However, let us first strange near different recital networks that will allow us to generate images that are similar to those in your data set. After that, what we do, we take the discriminator and we remove last one or two layers. We just remove them, and we can add your ears at the end of the discriminator and train these two years on the label data that you have, for example, If you have very few label data and a lot of unlabeled data, this should work. And, uh, this is very similar to what is called transfer learning. When you, for example, take some some drink neural network which was pre trained on big data set like image Net. And after that you removed last layers. You add new one and you trained this new one on your small existing label data set. So this is where youthful and we can also a blight guns. Here, Another possible application is supervised discriminator. So what is proposed? We have, ah, discriminator. And instead of just classes real and fake, we divide the rial class into sub classes like cat and dog, and you would classifier cat and dog. Now we'll have three classes riel, cat, riel dog and fake images. And when our against our trained, we can take again these discriminator and use it for our supervise task like classifications off cats and dogs. This is all for this video. Thank you for watching and see you in the next video. 13. Cycle GANs: in this video, we're going to study cycle Ganz, which was published in 2017. He you can see their examples of the cycle Ganz model. Its main purpose is to transform image to image. For example, the first examples show the transformation off the images off Monette drawing to the Fort ER and back second example demonstrate zebra to Horst information. On the third example, you could see how summer becomes winter and winter becomes summer. Finally, on the bottom, you could see the information off the floater. Two styles of different artists. ALS. These examples was made using cycle Ganz, and in this video we will study how it works. The distinguishable feature off cycle Ganz is it's UNP weird. More. It means that we don't need any my pin from one type off images to another engaged off cycle Ganz We need only do sets off images and the model itself learns the transformation from one set to another. In the example on the slide, you only need a set of photos and the set of drawings. In order to train the model, the place is on. The photos shouldn't be the same as in the set off drawings cycle. Ganz consists off four neural networks, two generators and to discriminate. Er's know that we don't have any input noise here, since we have two types of images A and B, we have a separate discriminator for both of them and the separate generator again for both of them. We start from the image A, which is a horse. In our example. The discriminator A will tell us if the image is off type A off Type B again. The difference from the standard Ganz is that there is no real or fake classes. Now we have two classes. Type A and I b The image A is an input for that discriminator A, which decide whether the image is of type A or B. Then image A in input for the generator A to B, which transforms horse into a zebra. After that, the discriminator be decides if this image is of Type B or a. The generator eight to be tries to transform the input image to type be such that the output will be indistinguishable in this model. We also have a generator be toe A, which transforms zebra back to the horse. This was the first part of the cycle. Gan situation. Now let's continue with another one. We start from the image, be as an input. Now discriminator be decides whether this is a horse or a zebra and tries to distinguish them. Then image be becomes an input for the generator B two a, which transforms it into a horse discriminator. A. Decides whether it is a horse or zebra. After that image, a again transforms to an image. Be with a generator A to B. The goal off the generator. A Toby is to transform image off Type A to an image off Tybee in such a way that the discriminator be will fail in distinguishing it from really images of Type B and vice versa . The goal off the generator Beato A is to transform image off Tybee to an image off Type A in such a way that discriminator a will fail in distinguishing it from real images. Off Type A in our model will have to adversarial losses, which stands for to discriminate. Er's and one pixel wise difference lost the goal off. The adversarial loss is to have generators learn property informations, pixel wise difference laws stands for the ability off generators to reconstruct image from Type A to Type B and back. Now let's look at some examples from the cycle Ganz paper Here on the dope, you can see the transition off for, say, to its labels and back on the bottom sketch are transformed to the realistically looking truth. The cycle Ganz model is able to transform paintings to 40 hours and force us to paintings, horses, two zebras, zebras, two horses and even oranges to apples and back. Nice application is in season change. The model is able to transform images from winter to summer. It may find its application in film development, where usually it is required to be summer, but the film is made in winter. The last nice application off cycle against model is background blurring Psycho. Gans are able to detect the object in the front off the picture and blur everything else. Well, now you know how it works in model mobile phones. This is a lot for this video. Thank you for watching and see in the next video 14. Superresolution: In this video, we will study how we can apply genetic defect for sale networks for the problem off Super resolution. If you don't know what is the problem off super resolution, here is their example. The left image is your input image and imagine that you would like to increase its quality . So you would like to increase its resolution if you just use some graphical editor like photo shop or faint or any other which used not de blown in techniques for two resolution but some standard interpolation methods, such as Biko, Beacon tribulation or nearest neighbors antipollution. Then he will obtain very bad results where you will see the pixels on the photo. However, if we use generative recital networks, we can update, worry high quality images off higher resolution, and let's study how we can do it. So here is the model off the generator and the discriminator that is used in the paper, which is called photo Realistic Single image Super Solution. Using a junior defender. Several networks these model works with patches from the images, which is a sub image from an image. You just randomly take some part off the image and use it as an input for this neural network. In order to train this network, you should have a data sad in the four month high resolution and low resolution In order to obtain these data said you should take a set off images off high resolution and manually make them low resolution. And after that, try to restore the high resolution images back. So you take this upset over the image on bond use one convolution allow and won re lute player. After that, you you have be residuals. Big blocks which consist off convolution all wears botched immunizations really layers and element wise. Soon you use several such residual blocks. We've skipped connections like in the air is net architecture er that was proposed for their classifications. After that, you use convolution, Bihsh, norm again women twice. Soon then you use convolution and the interesting layer we just gold pixel shuffler. And then really you Here is the pixel shuffler layer explanation What did us? In fact, if you have several channels, several futures that were obtained from the previous convolution all wear the quantity off which we can manage with setting the parameter of the convolution. We can rearrange or shuffle this big cells into one big image. So we increase their resolution but decrease the number off channels off the input image. This is pixel shuffle layer. After that, we used again one convolution, a layer and obtain super resolution batch. So we have low resolution and here you have super solution. Patch the structure off the discriminator is a bit easier. It also consist off convolution, als, licky real layers with bashing organizations. It also works with patch and as an output in decides whether the budge is high resolution from the data set or it is super solution. In other words, generated or synthetic. I know that we again don't use any noise as an input with the generator. Here we just use the patch off the image. Now, in order to construct the final lost function, we need to know what is Vigee network Veggie is a network that was proposed for again for calcification, but it is good as a feature extractor, and authors use these network for the constructing lost function what they actually do. They take each image as an input to this network and they take the features from the last layers and and here is how they can use it. So they extract features from high resolution image from the data set. And also they extract features from generator from low resolution image. So this is high resolution original from the data set, and this image is synthetic, but also it is super resolution. And they took the square difference between them, which is your cleaning loss. They average it with the images and this is what gold content loss. The second loss that they use is adversarial loss, which is a standard lost function for the generative adversarial networks that we have already studied. And in order to combine content loss and at the center laws, they just sum up them. We've the 10 power minor three in front of the adversarial loss, and here is their total loss. You can see their result. This is low resolution image. And you see the quality off this animal. Here is the high resolution ground truth you can see become bacon tribulation, which is usually used in standard image editors and super resolution again that they proposed. So, you see, the quality is almost indistinguishable from ground truth Here, you can see again. The result on the other image here is that because beacon tribulation which is rather bad here, is previous work from different authors. Here is their work and their original. And you see they are almost indistinguishable. However, there is a big difference between this images. So that was all four. Super solution, Gans. Thank you seeing the next video. 15. Inpainting: in this video, we will talk about the problem off in painting off images, and I will describe you page based approach using generative adversarial networks that was proposed in March. In 2018 he you can see what is in painting ease, and you can see their results off the work that who will discuss now. The task is Teoh, given an image with a whole toe, feel this hole with something. Usually we as a human can just guess what it can be behind these gap in the image, However, we expect that Newell Network will learn the same that we know. And, ah, for example, at the left upper image. We can guess that there should be a balcony, and the neural network also puts balcony instead of thes gap. Now let's talk about the model. So at the left there is a generator, and on the right, there isn't discriminator. The generator consists over video network. There is net, which was originally used for classification, but they used it also, and at several convolution layers. The interesting thing about their work is there discriminator, So how it works as an input. We put some image with holes like you see in the left. On this light, there can be several holes, in fact. And ah, after that we construct a discriminator in such a way that formerly we have to discriminate er's one global discriminator that decides whether the full image is really or not and praise discriminator, which predict a vector off probabilities where each element decides whether the given badge is really or not. Let's look at it in details so he you can see what I was talking about. We have some patch, the input image, and it corresponds to one element off the output vector off the discriminator. And this is the probability off these batch to be riel. I know that they split which discriminator and global discriminator Onley at the end off the discriminator, and several first layers are the same, and they have the same weights. There are shared tweets, the train differently, only global and patch part. Now let's talk about the lost function. Here is the daughter lost function, which consist off the construction loss off adversarial loss for global discriminator and adversarial loss for patches. And there are three parameters which are hyper parameters. Lambda one Lambda two and number three. They use standard against loss, which is binary cross entropy, and they use the same for bench and for global vision off discriminate er's as a reconstruction loss. They used pixel wise l one difference which is averaged or all images overall, pixels. After that, they sum up it with some confusions. Le MDA's here. You can see the result at the first drawer. There is an input image. On the second drawer, there is another put image so we can see that, Yes, there are some defects on the four does. However, it is rather nice results again. Some results knife, some building. You can see artifact Onda Uh, a bagel. Thank you. That was all for in Beijing with Agence she in the next video. 16. Text2image: in this year, we will study how we can generate images from their tax description. Use engineer, Defender. Several networks. Here is their example off this generated images from their text descriptions, you can see the images off the generated birds and flowers. Now let's look at the model that the waters off the paper proposed. In fact, this is a standard, conditional deep convolution engineer defender cereal network. However, there are few details and let's look at them. The first thing we do is there, according off the text description for this proposed, we can use embedding, clears, convolution, ALS and records neural networks. I will not go into details about according the text. However, you should understand that we transform our tech description into a vector representation. If you want to look at the details, please read their research paper there in mentioned in the right bottom corner. After we obtained that vector representation, we stag this factor presentation to the noise factor which was generated from the standard normal distribution. After that, we use a set off transposed convolution, all layers or the convolution layers, you know that do transform these input vector into an image and This is our generator. The same generator was used in deep convolution, all generally for the recital networks paper that we studied in the previous section. Now they're discriminator takes as input the image and at first layers transform it into some representation. After that, we stack again, as in the discriminator, with stack there according the embedding off the text description to these image representation. After that, we use convolution a layer in order to transform these image and dext do the label fake or real, and this is their discriminator network. Now let's talk a little bit about the algorithm off training for each iteration. The first thing we do, we take the two batches off text. First is much test description and Second East mismatch tax description. It means that for the given set or mini batch off images, we have the text that describes these meaning budge off images and the set off texts that don't describe this many about your images. This was done for their reason that authors want to somehow distinguished to situations when they generated image corresponds to the in, protects and when it's not correspond. That's why we have to decks description sets after that was simple. The noise from the normal distribution on DA make A for it passed through the generator off this noise and the match in text description. After that, we have three situations we can obtain really image with correct text rial image with Rome text and fake image with right text. Know that we can't obtain fake image with Ron text because we don't know what is wrong or right after that. When we compute this communicators loss, we use these three cases First Israel image and dry text. And second is riel image, Wrong text or fake and tried Dext it mean that we can consider two different types of Ferrars and in fact, here in the formula we take the average off these cases. It means that we take into account both cases when the image Israel but text is wrong and when the image is fake. But text is right On the next line we make an update off the discriminator with the hasta Grady in dissent or any its modification. Then we update the Jane writer using fake image with the right dext and again make a step off their Grady and descent algorithm. He you can see their examples that was made using this mortal GT stands for ground. Truth and hours means the model that the authors proposed. You can see this images are a beat realistic. However, they're not worry. High quality in an extraction will consider models that allows to generate images off very high quality. However, this work was made in 2016 and at that time people were unable to generate high quality and high resolution images. This model is also can be used for style transfer. We assume that our Zet contains the information about the style of the image. And that's why the authors proposed to introduce the third Neural network, which is called in quarter, that takes an English as an input and then transform it to the input noise again. So we take some noise than make a forward pass through a generator and then Mike for bus through there in quarter S and obtain Zet again. We do this by minimizing the squared distance between the input that on the output that and we do all these after. We have already trained the generator to generate released examples and then if you want to transfer a style from one image to another with firstly use thes and quarter Teoh obtain that that which is now not the noise but the style. And then best these that to the generator instead of the noise and obtain generate new image X hat. He You can see what I'm talking about. We have some text description and we have a set off images which we can see there as style image. Now, for example, we take this first bird. Then we make a forward pass off this image through the end quarter and obtain noise. But it is not a noise. In fact, it is some style, and Beth it alone with their according for this text to the generator and obtain this image . Now changing the styles, we can obtain the same bird but off different styles as you can see on the picture. So consider and columns. The styles are similar, and consider the rose that birds are similar. So that's the style transfer approach that the authors proposed in the paper. Remember that I already told you that using deep conventional Janina def Adversarial networks, we can make an interpellation in the light and space and obtain the interpolation in the image space. The authors proposed to do the same but in the text and cording space. So they took two texts t one and t two, and make their interpellation using some linear confusion. Better but a T one plus one man is better t two and make this interpellation changing better from 120 and what they obtained from Bluebird with black beak to red bird with a beak, you can see the interpellation off the same image, but changing bird the same you can see here, for example, these birds is bright. These birds, Doc, do you can see how the bird transforms from the bright bird toe the dark bird. This is all that I wanted to tell you about Text toe image generation. Thank you. See you in the next video. 17. Progressive Growing of GANs: In this video, we will talk about progressively growing, engineered if adversarial networks and model that was proposed in 2017 and now shows very good result. And in the next video, we will talk about the improvements off this model. You can see the result off the generated faces off these model. Along this, images have size 1024 to 1 Talon 24 were generated by the model that was trained on celebra high quality data. Sad. The model starts from generation off the small image off size 4 to 4. We greet Engineer Defender Cereal Network with the discriminator and generator that is, works with small images off size 44 train it on. Deal will obtain a stable situation when our generator he's good. After that, we add one more layer. In order to obtain that image off size 8 to 8, we had one more where to the generator and one more year to the discriminator, and we continue training both previous layers and current added layers in order to perform abs employing from size 4 to 4 to 8 to 8, we use nearest neighbors up sampling. You can see what it is on your screens. Green pixel. But the well you seeks becomes s Square to to do where all pixels have. Well, you seeks the same with eight or three with war and so one we just do nearest neighbors. Example in which is rather naive approach for India trip sampling. And after that we process the image with the convolution. Aware we repeat this action several times on Duh. Finally, we obtained a big generator and the big discriminator. Both of them are supposed to work with images of size one. How than 24 to 1024. Note again that we train the whole neural network not just recently added layers. Here again, you can see the results off the generated faces using these model. They also trained their model on L San, Data said. That consist off images off size 256 256 and you can see how realistic are horses, sofas, buses and so one. They also made a comparison off their model with previously state off art models on bedrooms, data set and on the right, you can see their result of their model, and the images are quite realistic. They also demonstrated that they didn't just cope it the training set but generated new images and for this, proposed the surge for the nearest neighbors. In the video G feature space that top roar is generated. Images and others are the nearest neighbours from the training set and for the comparison they just used Woody's selected in these books. So they just used faces, but no the whole image in order to obtain features that are really want to their human nose , eyes, moth. But no, the background here is another set off results for faces and for other, different categories, like airplanes, bedrooms, bicycles and so on. This is whole for progress to be growing against. In next media, we will talk about how to improve this model and how the same authors proposed the improvements off this mortal Thank you 18. Style-based Generator: in this year, we will move forward in studying progressively growing Ganz and we will study style based generator for progressively growing against the paper that was proposed by the same authors from and video company. But they obtained better results than with progressively growing against he. You can see the results of their work on the left. There are bedrooms off size, 256 to 256 and you can see that the quality off theme just is amazing. On the on the right, you can see the images off different cars off size 512 to 384 and a wolf. This images are generated. There is no one rial image. So now let's study how weaken, dude. The model that was proposed in the paper called Style Base Generator architecture for genetic adversarial networks, is their modification off progressively growing. Ganz on the scam With the slight on the left, you can see traditional progressively growing. Against that we studied in the previous video. We firstly trained the Gansz to generate image off size 44. Then we abs ample it and trained it to size 8 to 8 and repeat these action until we get the size 1024 to 1024. Now the authors per both some modifications off this him. Let's look at them the first modification that they proposed east the usage off, fully connected network for a latte and space. So we have some that from some prior distribution, like normal distribution or uniform distribution, they use thes that as an input for fully connected network that consist off eight linear wears and obtain W W is what they called style off the image. Later, I will show you why they call these style and how they use it. I know that they also use additional noise and they don't transform it and just added at each level off the image, as you can see on the picture. Now, they also use adaptive instance stigmatization in order to introduce these style w into the network and into the generation process. Now look what is happening here is the formula for adaptive instance normalization, and here are two arguments. First, his acts is the output of the previously er and why is the style which is the transformation a from the w. So assume that why is a style Now we have these X output. We compute its mean and subtract me. Then we divide by standard deviation. By these equation, we normalize our X. After that, we multiplied by the standard deviation which is a part of our style. And also we at you mean, which is also Abe aren't off our style and understand that both these new why mean and new I sigma are learning herbal perimeters which are a function off the latent Zet. Now let's look, What did they obtained with these style mechanism? They have these w which is a style. And as I said, they put it into adopt different system ization as new mean and use standard deviation Here , here, here and here. Now we have two sources A and so his beat and what they do, they can in fact, meeks the styles off different folders so they can use, for example, core styles from source B. But other styles from source A middle styles from Soulsby and again other styles off source . A fine styles from me and others are from source eight. So you can see the result in order to obtain style for an image, they do what they generate random that then make a forward pass and obtain some style. This is style be and again generate some another that make a forward pass and obtained some u W This is style beat and they combined. So sommeliers obtained style a some image obtains. Tell me you can see that now. In each row the persons are were similar and in each comb, their also very similar and dependent on the course middle fine layers. There is a difference in the images. For example, he will have a child, child, child, child And here is not yet And here is not child, but with a style that is similar to this child. And again he'll have where similar people similar similar Onda uh, in these roles they have different faces in the last rose. They have very similar faces on Do you use fine layers? We have the same face but different hairs, for example and different backgrounds. They also made an experiment where the used noise for the different layers. So at the A image here and here they used noise for every layer as in before every layer here he image being is no noise at all. So you can see the better quality off their generated image, then on the on fine layers and noise only on course layers. So you see that the most impressive, the most high quality results are obtained. When we apply noise toe layers, he you can see the results off generated faces so you could mention how detailed and high quality are the images he you can see again their results that they obtained by this model . So that was all four style base generator, and I think it's rather good work. It's one off the state off artworks. In other videos, we will talk about another type off generative adversarial networks, which is called big genetic defect cell networks, with which also allows to generate very high quality images. But they do it in different way without progressively growing, they are like standard. However, authors also obtained where nice results thank you and see you in the next video 19. Game of thrones: now with us. Discuss how we can generate characters from game of Thrones on how we can change these characters. We will use the architecture that I described you in the previous video that just called style GAM. Now our goal is to use these style again in order to transform existing forgers. What we want is to take a forger or, for example, John Snow and use it and transform it with the style gap. So what we do here, we now use V G 16 network. This is a convolution along narrow network. And if you don't know what this is just a set of convolution, a layers which are portrayed. So we used trained redigi 16 Onda. We need eat as a future extractor so we need the last layer output for the given image. Now we have a style again how it works. Just a reminder. We have some latent court Zet. Then we normalize it and obtained some representation W and used these double year as a style as an input to the adaptive insist immunization which I described in the previous video. So if you haven't watched it, please, dude. So we use these style as an input for adaptive normalization, and in fact, we can. In fact, we can Meeks styles from different levels. For example, we can pass some style here and another style here, So another representation w from another image. And now the question is how we can obtain these latent that for not a generated image, but for some really exist in one image. For example, for the photo of John Snow and, uh, what we can do here, we can take some image that is generated so generated from some late and court. We know the lace in court. We know they're generated image, our style. Ganis already trained, and we have a veggie network that is pre trained. Now we can use these image as an input to the VG G network and obtain some weight and representation here. So at the end off the VG, after that, we can compute l to your ingredient loss between the Input Zet that we know and the predicted our put off the rigidity and we can update video G network in order to minimize these l two ingredient loss. So we won't do these output to be closer to these late in space What it will mean. It will mean that for a given image we can obtain some late and court that that we can, after it use as an input to the style gan and obtain their same image. So I hope you understand that we want we know how does this form from sets to an image? And now we want to build and in quarter that will transform from the image toe the set space Yes, and now what we want here we use when we trained this veggie for weight and court for recording. We use some existing falter and can make 1/4 pass through our trained VG to obtain some latent court. That is that in the previous video, I described you how we can mix styles off the images. So for this reason, we just need to use different styles for different levels off hierarchy for different, adept, different immunization layers on before somewheres, we can path the staff one image and for I know wears. We can use the style for another image. So if we can obtain that, we can obtain that for De Neri's and we can obtain that for Jon Snow, and after that you have two styles too late, and courts we can use them both first for first layers and 2nd 4 last layers and makes them and obtain an image that is an average off Jon Snow and binary star Guardian. Now what if we would like to increase the age off the person in the photo? So we have existing one existing photo, and we would like to increase the age off the person on this photo? What? We want to see how Jon Snow will be when he will be older or when he was younger, how we can do it using the same mechanism. So now listen. It's very interesting. We will have some current set for the given image that we studied, how we can obtain it using VG Net board. After that, we would like to have two directions. First direction is for increasing age, and second direction is for decrees in age, and we would like to move these late and Court Zet, for example, to decrease in age in order to decrease the age off the person on the photo. So now how to find these direction It's really simple. Remember logistic regression? I hope you know what it is. In fact, it's a linear classifier and one between logistic regression. We usually want to obtain a separating hyper plane, which has coefficients W one, W two and me. This is just their hyper plane that separates two classes now for the purpose or fighting their direction. We need to have labels for images which image is old and which is young. So imagine that we already have these labels, and now we know they're mapping from image toe the old or too young classes for each image . We can obtain the late in court, and we can make a classifier that will classify a late and court Teoh two classes, old and young person. This is just a logistic regression. You can use Sanya logistic regression, for example from Skype Learn Library. Now, after we trained logistic regression, we can obtain that w one and w two conditions and consider it as a direction to the old class white because it's in fact, they're vector. That is our signal to our separation hyper plane. And if we moved to these direction off the vector w one w two. Then we will obtain more older later representation. If we can say so, if we will move toe the opposite direction Taking minus w want minus w two. Then we will obtain more younger weight in the presentation. Now how we do it. We have our late in vector for the initial image, we have their direction, which is a factor coordinates in the same space that is in the same late in space. Now we can set some confusion for example 02 or minus two, as you can see. And ah, just some our religion Court blast coefficient multiplied by the direction vector and what we wrapped in here and the middle. There is an in pretty much on the right that image with the confusion too. So more older, no more younger and on the left, with the coefficient minus two more all the woman. So this is how we can change the age off the person off the existing person off the existing 40 off the person not generated one. And this is how we can use this style, Gan. In order to do it, we can also use Jon Snow and make a child from him and then Interpol. Eight From the child to the grown objects. No, And make this interpellation toe video and you can see the results how Jon Snow became grown up We can do. In fact, all these dio many thinks like, for example, gender. We can also find their gender direction. So where in the lead in space are man and where ah woman for their forces and transform the existing falter to another gender? For example, one 0.0 is corresponds to man and ah, minus 1.5 corresponds to a woman. We can do the same with smiling who have labeled data, smiling or not, smiling and wagging Train another logistic regression. Find a direction for smiling woman, not smiling woman and can add a smile to known smiling woman no man or any other person. So that was all. There are four sources that you can see first is get happy repository that proposed all of these police look at it. It's rather nice, and there is a paper called Image to Style again, which was proposed on the ninth off April in 2019. So it's recently proposed article and it has a very similar approach, but be different. There is an article about game of Thrones characters, and in fact, you can play with yourself. You congenital rate. Other directions for game of Thrones. Characters, please use this notebook and I will attach Jewel. This links to the lesson. Thank you. See in the next video. 20. BIG GANs: in this section, we will study another breakthrough. State off the art Jerry and Federal several networks, which is cold beak. Ganz Biggins was proposed in 2018 and showed the state of the art results in emerged generation. Their contribution, in my opinion, is that they were able to combine a lot all different techniques into one single model that can generate such a high quality, realistic images trained on image. Net data sat so they didn't propose any breakthrough. But they combined all breakthrough models and demonstrated there were in nice result on the slide. You can see another examples off off samples from this model, which are really, really good and realistic. Here on the slight, you can see the example off the interpolation between different images in the light and space that we already studied. So this was the situation for the big Genz. In the next video, we'll study them in details 21. BIG GANs model: in this video, I'm going to talk about the model that is used in B Ganz on the left. A image You can see the whole model. As I told you in the previous video, we have an input set which is then split it and could contain ated with the class of the object that you would like to generate. After the computer nation, it goes to residual block. Onda, We have several such residual books here and at the Antwi obtained the generated image now here is in the center of the picture. You could see one residual book. So here we obtained the result off their concatenation and we have to lean your layers. This is that I called embedded and that authors called embedding they transform these Khunkitti nated result in tow the mean and standard deviation for the bench. Norm Onda multiplied by standard deviation here and at you mean then they use real Ooh, abs employing 3 to 3 convolution again bash normally Lou and again, 3 to 3 conversion and notice that here they have so called skip connection. So the image is going here and here and here at the end they add the results off these and these at this moment, images from the from these right throat and from these have equal size, so we can easily add them to each other and go to the next layers on the right picture. See, you can see there residual book which is used in the discriminator. So now, instead, off abs employing here we have average pulling layer. We don't put any class to the bench, Norm and we even don't have much norm here. Another thing that I wanted to mention is that they used hinge loss in standoff standard loss. What does he loss? He's loss, I hope you remember, is the lost function that he used in the as VM algorithm. Imagine the SPM for buying reclassification keys where you don't use any care. No, on da this. This will be the optimization off hinge loss. And if we use not do a problem in the SPM but we use original problem optimization, then who will optimize the hinge loss? In fact, it is the off Rheal blast maximum from zero and M minus d from fake I m is so called margin and the situation for the hinge loss is the following. We have known zero grade Ian's when that d from fake more than M. And, uh, it helps when G is far from convergence. Imagine the situation when you just start training you agains. Then your generator will give you only a noise and Grady INTs will be. Were it close to zero. Huge loss helps in these cases. This is all for this video. Thank you. 22. BIG GANs techniques: let us know. Study some off the techniques from Big Jeanneret ephedra. Several networks paper the first, If we can say so, technique is usage off large budges. Traditionally, the batch size is equal toe, 64 to 256. But the authors demonstrated that if they used the batch size equal to 2000 and 48 then their inception score improves by 46%. It is rather nice result because there is an opinion in deep learning that the batch size doesn't matter as much as the number off updates off the Grady and dissent, however, authors experimentally showed that that Big Bunch size is rather good for against, and the intuition behind these is that each badge cover more moods. And, ah, this is stands for a problem off more collapse, which we already discussed. Just a reminder. A mood collapses that when our model starts to generate samples only from one moored off our distribution and not from awards, it also provides better grade Ian's. When the batch is big, the next technique is skip connection from the Latin variable set to multiple layers off their generator. Traditionally engineer defender, Several networks that as an input Teoh, just the first layer off the generator in these paper authors proposed to use it as an input for several wear. So vision Reiter. So how they do it, they use Zet and class. They I took these that split it into the sub vectors and use it in coordination with the glass label here, here and here, and make it and input to the residual book, which will discuss in the next video. The intuition behind these is that it directly influenced features at different resolutions off images because at each layer we increase the resolution. Next technique that authors used is class condition. Best normalization. However, let us start with standard brushing organization. Let me remind you what it is. We have a set off images, a bench off images X one to X G and we introduced to learn herbal parameters gamma and beta . The first step we compute this mean and standard deviation off our batch off images will and sigma. After that, we normalize our data by subtraction their new and dividing by Sigma Square blast some small absalon in order to avoid division by zero. This is their normalization off our data. And then we multiply by gamma and at beater toe a data. What it means. It means that now gamma is our new standard. Deviation and beater is our new mean value. This is good because we say the four you neural network should learn what should be the mean and the standard deviation off our data. And usually it helps with poor Grady intimidates and with bad network in civilization. However, authors proposed to modify these setting, and they proposed class conditional based organization what it is, they say, that let our gamma and better be some function off the input. Why? Why is a class label, for example, a label that stands for a cat or a dog or playing or any Chanel's? So we just one hawt according for class label. Then they take some function off this one hood and Gordon, which is called embedding, which I will discuss in the next light. And these function returns to really use gamma and beta, and they used these gamma and beater as a standard deviation and mean value for our data. Now what is embedding in Beijing is just a learn herbal, linear layer and that's all. So we have Blue Vector, which is a vector off class. It is one hot and accorded, and the embedding is just a little earlier. In other words, it is a magics which is multiplies to the input vector, the size of the output factories to it has two parameters gamma and beater, and that's all. And it is important to know that they use shared class conditional embedded. It means that they use the same layer for all the network. Another technique that they used East orthogonal, regularization, eat allows wait to lie close to the orthogonal money. Fold Andi eat again alone with big batches helps with more corruption In order to perform orthogonal organization, we need to add these regularization term that you can see on this light to our total loss function. So you just w is that weight magics and I stands for identity magics. So it's just the definition off their orthogonal magics. He you can see their results that they provided. The first column is batch size, the second he's channel multiplier that represents the number of units in each layer. The thirties number of parameters in millions, then the indicator. If they used shared glass. Conditional based normalization, then skip connection for Is that and then orthogonal organization. So all these techniques that I described in the few prove slight. And it occurred that using a live these techniques allows to improve the score Both inception, score and fresh air inception Distance Significantly. I remind you that we have a video about for Shantz. Option, distance and inception Score in our course. If you didn't watch it, please do it. It is rather important metrics that people use for comparison their models. And they showed that if they combined, shared embedding skip connection for that are so banal Regularization and big batch size than their inception, score and fish Inception Decent is the best. So this is all for this video. In the next video, we will continue analyzing Beak Ganz. Thank you. And see in the next video