Machine Learning & Deep Learning Bootcamp | Ajmir Goolam Hossen | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Machine Learning & Deep Learning Bootcamp

teacher avatar Ajmir Goolam Hossen

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

37 Lessons (3h 8m)
    • 1. Section 1 - Introduction

      2:30
    • 2. Definitions

      2:46
    • 3. Uses

      5:53
    • 4. History

      3:16
    • 5. Types

      2:02
    • 6. Section 2 - Basic Neural Network

      4:16
    • 7. Activation Functions

      4:28
    • 8. Feedforward Computations

      5:40
    • 9. Section 3 - Introduction to Training

      4:33
    • 10. Loss Functions

      4:29
    • 11. Minimizing Error

      5:27
    • 12. Learning Rate

      3:15
    • 13. Initialization and normalization v2

      2:44
    • 14. Section 4 - Install and Set up - Anaconda

      3:52
    • 15. Use Jupyter Notebook

      2:26
    • 16. Basic Pythoning

      10:09
    • 17. Helloworld Neural Network

      3:11
    • 18. Section 5 - Tensorflow - Design Flow

      4:37
    • 19. Loading a dataset

      7:53
    • 20. Build a model with layers

      7:12
    • 21. Run Compile

      4:35
    • 22. Fit - Training the Network

      3:12
    • 23. Evaluate and Predict

      2:01
    • 24. Section 6 - Convolutional Networks - Intro

      6:44
    • 25. Convolution

      6:29
    • 26. Pooling

      2:05
    • 27. Convolutional Network Project - Part 1

      7:43
    • 28. Convolutional Network Project - Part 2

      12:15
    • 29. Intro to Section 7 - Recurrent Neural Network

      6:16
    • 30. Structure of RNNs

      4:33
    • 31. Examples of Recurrent Neural Networks

      4:02
    • 32. Training RNNs

      6:41
    • 33. LSTM and GRU

      3:17
    • 34. Project on RNN

      11:59
    • 35. Section 8 - Intro to Generative Adversarial Networks

      4:42
    • 36. How GANs work

      4:24
    • 37. Project on GAN

      6:37
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

78

Students

--

Projects

About This Class

This course was designed to bring anyone up to speed on Machine Learning & Deep Learning in the shortest time.

This particular field in computer engineering has gained an exponential growth in interest worldwide following major progress in this field.

The course starts with building on foundation concepts relating to Neural Networks. Then the course goes over Tensorflow libraries and Python language to get the students ready to build practical projects.

The course will go through four types of neural networks:

1. The simple feedforward

2. Convolutional

3. Recurrent

4. Generative Adversarial

You will build a practical Tensorflow project for each of the above Neural Networks. You will be shown exactly how to write the codes for the models, train and evaluate them.

Here is a list of projects the students will implement:

1. Build a Simple Feedforward Network for MNIST dataset, a dataset of handwritten digits

2. Build a Convolutional Network to classify Fashion items, from the Fashion MNIST dataset

3. Build a Recurrent Network to generate a text similar to Shakespeare text

4. Build a Generative Adversarial Network to generate images similar to MNIST dataset

Meet Your Teacher

Hi, My name is Ajmir and I've been an electronics/programming/science hobbyist since the age of 12 and obtained a Bachelor in Electronics and Communication Engineering from the University of Mauritius in the year 2001.

After graduation I attended a short course in Web Design, where I learned HTML, PHP and Java.

I've worked at various positions in the Electronics Industry, including Sales Engineer for electronic instruments, Lecturing in Power Electronics and Data Communication, Test Engineer and Debugging boards for UK Road Signs.

I've worked for nearly 7 yrs at Xilinx in Dublin, Ireland as a Product Applications Engineer, supporting Xilinx FPGA Design Automation Software tools. I became a specialist in Synthesis tools/Timing Analysis at Xilinx. I also worked with D... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Section 1 - Introduction: welcome to the course deep learning boot camp. And thanks for your interest in this goes this section's about introduction to the course. We talk about the course itself, but the Coast plan We talk about definitions, types of neural networks, the course from start with introduction Section one when we talk about history types and uses of neural networks. Second section we talk about the components off basic neural networks, third on the basic concepts relating to training off neural networks. Then we get ready to do some practical works. We'll get our software already, which is on a conduct. This is an integrated software which will also installed fightin, which is a language with views. Two bizarre projects will also use Jupiter Notebook. We also install Tense Off, which is a vital library that will use to make our neural networks. Section five will be on feet forward. Networks will do a practice on this kind of networks. Next reason about convolution alone networks in fury, and we were also put it in practice by implementing on Tensorflow. Then we will learn about recurrent networks will do the fury and the practice. Then we within about generative adversarial networks Fury and practice. What you need to know before taking the schools basic school month summation would be to have conquered this derivatives, partial derivatives, a bit of trigonometry, functions and graphs. You would have to have basic installation skill to install Anaconda and basic programming skills because we have to use Peyton in our practice. But if you don't know any Python programming, this is not a problem, because we go through the basics in one of the sections. But you would have to know programming techniques such as loops factions on variables. 2. Definitions: let's define what we will be learning. In schools, we often hear artificial intelligence, machine learning and deep learning. This is how they are related. Deep learning is a subset off machine. Learning on machine learning is a subset off artificial intelligence. This is a definition, the science and engineering off making, intelligent machines that has the ability to achieve goals like human do. This is a more abstract definition. Anything that resembled human intelligence is artificial intelligence. Machine learning is a sub field off auditions, agents dealing with the field of study that gives computers the ability to learn without being explicitly programmed. So this is a different way of looking intelligence. Instead of programming the intelligence into the machine, we are giving the machine the ability to learn by itself by presenting it. We've cases, we've samples with goals and the machine itself. It will, based on the goals based on the goals that we set it to achieve, and at the bottom we have deep learning. This is what we learned in the scores. It's a subset or machine learning anything that's machine learning but are inspired by brain computation through the use of networks of neurons is deep learning. Here's a look at how intentional programming differs from machine learning. Dimensional programming is where we use rules and data to get answers to rules involving. That's the functions alteration processing data includes, like getting information from a server. This is treasure programming. It will give us an answer while machine learning differs from directional programming because machine learning uses answers and data to get rules, if we come up with rules based on our data and answers so that you can use new data and predict answers. So here is when we infer prediction from data we have trained our machine learning, and then we can infer prediction using new detail. As you said, deep learning is inspired by brain computation. So in deep learning will make use off neural networks like Instagram. It choose a network of neurons connected in this way. This is what in your own network will look like 3. Uses: How is deep learning? Used? Deep learning? Does a good job a prediction? For example, Stocks prediction. It's where we give the neuron that we're history details on a train, the network so that we get a model that can predict. Ask those as possible so you can use neural networks to predict stocks values. Prediction of weather. This also will use history data from satellite majors to predict how the weather will evolve in the future. Prediction. Off earthquakes. We will train the neural network using previous seismic data, and it will generate the likelihood off seismic activities such as earthquakes in the future. At specific locations. This is prediction. The other problems that neural networks are good at is image recognition. So a major collision is about training in your network that will be able to understand the content of an image. So, for example, here we have flower bird entry. So what we do is to give hundreds or thousands off picture of different flowers, different colors and shape, and also birds and trees. In this way, we trained the models that you can understand the content, off picture and different shit with a new picture. Easy flower, a bird or a tree. There is a problem off gasification. We're classifying the image into these free categories. A major commission is also used in medical. Imagine Eyes is using neural networks in medical image analysis. We want to use scum pictures. For example, in the case of identifying humorous issues, the doctor will look at different a doctor. We look at anomalies in this, come to find out if the tissues are normal or not. In the same way in your own, that can be trained. We've thousands of majors so that we will have a motive, which would be able to identify anomalies in the brain. Euro network is also useful in speech recognition since speech various hitting people Neural network is very effective in spotting patterns and identifying which word is in the speech. And also, speech synthesis is possible. We've your networks. Language translation. Translating between different languages in your own network is effective in doing this job image captioning. So we have an image. There's no words in the image and the neuron network of trains that you can read the content of the image and create a caption that give an understanding of the image self driving cause. This has become a hot area of research on Google design of the companies, and a deep neural network will be able to read the sensors on guide, the vehicle using different dogs and speed and the steering wheel. So we're taking information from sensors on Bring It to a Train model, which is a deep neural network, and this will control the different, and this will control the tragic story off the vehicle. The other use off deep learning is generation, so generating text began. Train a deep learning model. We've a body of texts such as, for example, the Bible or Shakespeare and make it generate the text that is similar generation off or do and speech handwriting. So we trained the model. We've many handwriting. It will be able to create new images off handwriting of similar style. This is an example off generating images, staking the styles from these three pictures on creating a whole new image, a whole new face based on these images generating a Michigan. So this one is an example where we're drawing a bank and the network would be able to feeling we've really majors This is very interesting. It's about off generative adversarial networks generating three D images. Here's an example were three D images are being created by a deep neuro network. You can also Jared videos, for example, Deep fake video where the face is Stop. We've under the person's face. Has an example off deep. Think this is original one. This is a fake. You can see the face off Jury found on these on this guy's face. 4. History: here is a brief history off neuron networks. Here's a brief history of euro networks. The focus off this history would be mainly on achievement. In 1943 markka look and Beats. I wrote a paper on how neurons might work the model. A simple neural network using electrical circuits. This is a starting point of history for your own networks. In 1959 withdraw and half, there's a motive School Adaline and Mad Ein. But online was the first euro network applied to real world problem, using an active filter that eliminates echoes on phone lines. The first multi that your perceptron was built in 1975 in 1986 progress was made in training multilane your networks and in developing the algorithm that we call back propagation, the algorithm off back propagation is still used today for training on. Also in this year, the re correct neural networks was introduced in 1986 research groups came up with similar ideas off back irrigation to train to train neural networks. This was a progress toward solving the problem off training in murder that your networks in the same year the recurrent neural network was introduced by David Roman Hot, which was based on hop kill networks discovered by John Hop, fill in 1982 in 1987 the time the the neural network was introduced. And this was the first implementation off convolution Aled Networks. These networks are used primarily for image recognition in 1997 Deepu, which was built by IBM one against Garry Kasparov in just game. This was the first time a chess champion lost against a manmade machine into Sena. At STM, which is a variant off. The recurrent neural network was proposed in around 2000 and six. A Siri's Off breakthrough by Geoffrey Hinton that two major progress in deep learning and for the first time, deep learning is giving better results than traditional artificial intelligence algorithms . In 2000 and 14 the generative adversarial network was invented by researcher Young Goodfellow and said to leaps in charity algorithms to charge aeration off majors on videos . And in 2000 and 16 Google's Deep Mines Alphago program competed with the Siddle, who was a champion in this game and 1/2 ago. Want again? This was a major leap in deepening because Gu is a highly complex game regarding into shin gritty and strategic thinking 5. Types : So what are the types off neural networks in this course, we will start with the four most popular neural networks will start. We feed forward is the normal basic, also known as murder, that your perceptron. It's simply an interconnection off neurons. Then we will look at convolution. ALS, neural networks. These are our neural networks. Adapted for imagers. It uses a series off filters. No one asked convolution and pulling to reduce the amount of data in the neural network and make amazed recognition easier. Then we will look at the recurrent neural networks. These are neural networks for sequences. Then this network will be able to spot patterns meeting the sequence. For example, if you have a Siri's off words that we want to pass through the network, it would be able to spot sequential information. Then we will look at Jerry Atif Adversarial of networks. These are networks, which has a generator and a discriminator. So here we have a generator. It would create fake data, and we will have a training data. The discriminator role will be to identify weather. The data is coming from the generator or whether it's coming from the data set it will give its verdict and then based on the verdict, the jury that would be trained. So we've time. It will get better and better. And after training you will be able to generate data very similar to the training set. They're these other four euros networks that we will look at and these are the moves popular Euro networks of the movement. 6. Section 2 - Basic Neural Network: in this section, we look at some off the component off basic neural networks. The plan is we look at new runs. Next, we look at basic neuro network structures, which will include layers and weeds. And then we look at activation functions. And finally we look at computation for FIFA world networks. So we'll take an input on passage through the Euro network and get the art with half an example. We've numbers. The brain consists of specialized cells called neurons. Human being has, on average, 16 trillion neurons. Gats has 250 million neurons on average, and lobsters have 100,000 you're owns. This is just a comparison off how number of neurons varies between organisms. This is a diagram off a biological neuron. In this diagram, you would have one neuron connected to a neighboring neuron. What the scientists have done is to try to motor the biological neuron into mathematical approximation. So here is what an artificial neuron would look like. So we have input. We have on foot, and this is the whole neuron. This is one neuron each and put me in good through weights and they will be summed then there would be a bayous added to the sum. The result We passed through an activation function. We should give the output. So this is one you run to given up, and this goes to the next you're on. This activation function is usually in under the in your function that we simulate the firing action off neurons in the brain. Here is a neuron model again, it is just expressing this neural model as a mathematical equation. So what's doing here? Each input with multiply by corresponding. Wait, thanks. This and there would be summed Bless the bias to summation person by us. And this will feed through an activation function. Yeah, and the wine here is the up. Now, how are neurons connected to other neurons? When we connect neurons together, we get a neural network. This is usually how we will connect neural networks. So it will come as the years one in prison. You one out with you. In this diagram, it shows a hidden the U. Usually there are more than one hidden layer. And this is what we call deep neural network. If it only has one. It's called a shadow. A neural network or just in your own network. In this case, we have three neurons at the input. Oh, nudes. It's icicle notes or neurons. So we have three notes. Other input. We have four nodes or neurons in the hidden earlier, and we have two neurons in the opposite a year, and you'll notice that each one off the neuron is connected to the next. The year freely. So, is this your enemy Codec? Do this in Iran, this one and this one in a month, and same for this figure to the next. So it's a fully connected multiyear neuro network. He's another deep neural network. This time you will see that the number of hidden layers is too. We have. He didn't do you one and he didn t you too 7. Activation Functions: Now this. Take a look at activation factions days, an activation function on each Iran except the input mutants. So these curve call activation function. The the idea behind activation function is to recreate the firing action that exists in biological neurons. Teoh. There are different types of activation functions. This is how a typical activation function will look like. There are several activation function that we use, but we look at a couple of them. We look at these ones sigmoid, Really thank you really hyperbolic tangent and soft max. So let's look at each one. This is called the sigmoid function. The signal is a mathematical expression that consists off one divided by one plus e power off, minus said, You see, for big values, he will stay at one, and for low values it will tend to zero. For example, when said is going to fall like here, F said, which is a sigmoid function is one over one plus e the power off minus four, which is he cried too. 0.982 This is ah signaled activation function. Next, the radio rectified, then your activation shortened. For really, this is a very popular activation function, so it's quite simple. Is one. If the value of X is this an zero, it would be zero fact. If the value of X is more than zero, it would just be aiding affection. So if it's too, the output with me, too. The stick An example. X is going to fool if executed. Fall. Why F of X will be full if X is good to minus full F of X will be zoo. That's really activation function. Now we have a different kind off. Really, this is called leaky. Really. It's slightly different when X is good to lessen. Zero. It's instead of being zero in B 0.1 time sex. A second example. When executed five f of x, which is the leaky really function, we'll give five. When X is good to minus five, it would make declined by 0.1 It would give minus 0.5 Next, we have the hyperbolic tangent function, which is just done. Hatch X. It looks very similar to the sigmoid function, and we look at soft max. Soft max is a probability distribution, so the calculation for Self Max is a bit different. Let's say we want to apply self max on these values. We have value 0.71 point two, 0.5. So this is a creation that will use so input 0.71 point two and 0.5. So first we have to calculate the e power off this number. So e power 0.7 is 2.13 e power 1.2 is people 3 to 0. E power is broken. Five is 1.649 and to get the are put and to get the soft max are put off each. 8. Feedforward Computations: Now let's take a look out the actual computation with examples. We'll have to use this situation to feed forward the input through the network and get an output. So let's take a look at an example. We have concrete numbers. So this is our network. It has these weights. It hasn't by us off 0.1. It has weight from this new run to that 10 point free from this one to that one's Europe on 15. 0.7 01 25 0.23 and 0.88 These all the weeds from Sicily You do that the year and from the way have weight 0.56 0.4 and 0.9. We have input 0.5 and 0.2 disease neurons, and we have to compute the upper it. To do this, we had to feed the numbers through this neuron. Get this number here and then continue exists. Did we reach here and we'll get this answer. That's why it's called feed Forward. First will be competing. They are put at a so they will receive signals from this neuron and that neuron you take 0.5. So what do you really receive? You received two values. 0.5 multiply by 0.3 which is a weight zero point too much Bye bye Group on 15 And there is a summation here. That's why we added here. And the answer is 0.182 This is what after it has been summed. Yeah, off The summation it will post for the activation function is going to the activation function 0.18 which is obtained from here plus 0.1. He's A by us Two of us we had to buy us comes in and we have used is signaling function. So what we have to do is to calculate what's the signal it off? 0.28 Are we open Google calculator There will do one defined by I want this The power off minus 0.28 The answer is 0.5695 This is a value of the signal after a our put by the activation function. Now we have to do the same for B C. We had to take the three signals from ABC, boss it through the weight on through the activation function and out. No, let's do for beef for B. We have input 0.5 multiplied by the weight 0.7 here, this zero point too much by by 0.45 And we have to some these two values. The answer is 0.44 And after that we have to take the sum and passage through the activation function, which is a signal, and we have to add the bias off 0.1. So we're doing signal it off 0.54 and the answer is 0.6318 This is the are put same full C 05 times 0.23 plus 0.2 times 0.88 and the answer is 0.291 We had a positive through the activation function blessed by us. So we're doing Sig might of 0.391 and the answer is 0.5965 Now we had to do the same for the airport. We're taking this value times its weight this value time just wait and see times its weight passing through the activation function. Plus, they're seen by us and we get the output. So is your wants. 5695 times 0.56 plus 0.6318 times for us. The 0.5965 times there were nine equal to 1.10849 And this value, you have to pass it through the signaling. But the by us inside the function and the answer is 0.77 does. This is the final output. So we're passing 0.5 and 0.2 as input and what we're getting out off the network is 0.77 In this way, we have computed the output from the input. 9. Section 3 - Introduction to Training: in the last section. We look at the components off neural networks, and we also looked at feet for warding through a network. In this section, we will look at training a neural network to understand what's training. Let's take the another G off training a student by giving the student questions and the worked answers. So each question comes. We've work answers and the student interns. In this week, the student. We go through hundreds of questions and read through the word answers and try to understand as a subject. So this is training of a student. At the end, the student will get a better understanding of the subject. After the training is over, the student can handle new questions by using the understanding again through the training process. This is an analogy off training in deep learning. A similar technique is used in deep learning for training your own networks. So training a deep neural network with consists of feeding input on expected are put to the network and change the network so that future input with match expected output. In doing so, we're extracting rules in the form off a model to that in the future we can use a ruse in the form of a motor to infer predictions from new input. This is training and this is inference. The training in your own network will involve using a data set that consists off input and expected output. In this case, we have three input values and to expected output values. This is how our example in your own network is designed. So we have three input and to output the three input values 0.2 point three and minus 19 We come here and the expected on fruit 1.1 and minus 0.9. We'll be compared with the output, Ilya. And is this expected? Are put Will be compared with the actual value under up. So for each sample were feeding the input get in, are put and compare it with the expected are put at each iteration. We are day the network so that the expected quit become closer to the actual output. So during training we go through the sample one by one. And then I did the network. Now this ticket Look at the training inspiration. We go through a flu that Graham starting from an input in the data set. We feel it through the network through the process of FIFA Ward that we look at in the last section. As a result, we get an actual are put and then remember in the data set we had an input on an expected after it. So we have to compare the actual output. We the expected, are put. Let's say we take the difference meeting that too so expected on foot minus the actual output, we'll get some single and error. This is a zero value that we have to work with to update our network. So we uses era and update the weights in the network. And then we start another cycle by taking and other input feet forward for a network. Getting our fruit and get a new era did the network. This is a training iteration, so this part is the most complicated part in this diagram. This boat is about updating the network based on the era we are taking the weight. We're taking each weight in the network and updated with the corresponding that await, which is updating through a process called back propagation 10. Loss Functions: here. In our example, we have calculated the error simply by subtracting the actual output from the expected output. And this calculation is an era function. This is a generalized era function where we have expressed error as affection off expected banquet and actual output. We have simply subtracted the two values the researchers have found that Eros calculated in this really is not very effective in training our system. So the stick look at a couple off era functions. These are also called thus functions. So here is our neural network, and these are our upper. It we have actual are put 0.5 and 0.8 on node one and No. Two off the airport area, and the target was 0.64 This one nude one and 0.7 for Newt to So actually I'll put a 0.5 point eight 0.6 point seven. These are three lost functions. This one is a simple subtraction, as we shown in the diagram before. The target value is 0.6 minus 0.5 years. 0.1 0.7 don't get value minus the actual value is 0.7 minus 0.8 is minus 0.1. And this is US function users absolute values. So the loss here is 0.1 and one. If we use this kind of loss function in our third example, the lost function off, subtracting value of the target and actually and then scaring them. So we have all used 0.1 and 001 So now if you make a total off each one of those, we see that this one turns out to be zero. So it's saying that the loss is zero. While we do have a discrepancy, omitting the actual and the target, that's why they'll get minus actual is not a very good idea. Now, this one, the total Is there a point to which is better than this one, And this one is your 10.2 So we're can creating the target and the actual and scaring them . This is a long list. Best among this free. We continue. We've off the glass. Functions here is under the last function called mean squared. Error is calculated by taking the total script era divided by the sample number. This is a problem. Will use to the squared error total 0.2 divided by the number off samples and the sample number is two we have to hear so the mean square error MSC is 0.1 In some cases we use the root mean square era R M sc simply the square root off MSC which is in our example scrape it off 0.1 which is 0.1. All those factions here is a table with some other lost functions. We look up, we already look at mean square era. There's also mean squared logarithmic era I mean absolute error which is simply the average off absolute errors and categorical cross entropy and binary cross entropy these US factions the best four regression analyses and these two will be best for classification. These archetypes off problem in deep learning 11. Minimizing Error: now, how do we minimize the era as we showed before the era will be affection off the expected on the actual values. Now, if you have to answer this question how to minimize era, we have to find out what are the variables in dysfunction. The expected are put off fixed because you're part of the data set for training. This we cannot manipulate. The actual output can be changed, but its affection off. Wait and input the weight and input me determine the actual outfit Among the two I m pretty is fixed because these are part of the data set. So the output expected output and the input of fixed for a given sample the only variable the weights. So the point of training is to tweak the weights in a neural network so that the era drops to a minimum, so the error at the output can be expressed as affection off weights. This will be a very complicated creation that consists off waits as variables, and these weight variables should be changed so that the error becomes minimum. Since the infection is very complex, we cannot simply inverse affection and find out what other weeds you did to minimize the era. Let's take a look at the process off minimizing era in a simple function. They say the era is affection off wheat, just one variable. So we have one week that will change. The era does the era versus the weight. So let's say we are around here. The weight is 0.2. In this graph, it's clear that the minimum is at 0.5. But let's see. We don't know where is a minimum. We are. Here we can do is to calculate the Great Ghent on this 0.0 point two. We find out the greedy int. If the great yet is negative, it means that we have to increase the weight. If the greedy int is positive like here, 0.7, it means that we have to decrease the weight so that become closer to a minimum, which is this point. So here we will increase the weight and here we will decrease the weight. Now, in real life, our function we have more than one week. It can have hundreds or thousands of weights, so it's a very complex process off minimization. He resisted in secular landscape in our mountains and valleys. In this example, the loss is in the form of a three D diagram, so we're dealing with two variables. The goal of minimizing the era in this kind of function is to step from a higher please downwards towards a minimum. And this is what we call Grady in dissent. This is an algorithm to find the minimum in a complex function, a only affection. For example, we're we can easily find the values for minimum just by computation. So we have to take one step at the time, check the grid yet again, and then see if we're going hours a minimum or we're going away from the minimum. The goal is to continue, seem moved. Howard's a minimum so that at the end we're reducing the cost function, and the point of reducing the cost function is to make the actual output expected on this diagram shows a comparison between the function and it's derivative. The first order derivative of this function is a line. This is a parabolic function if this is that the devotees gives the direction to move. If we are here at 0.7, let's say here we want to minimize the era. We have to. We simply have to reduce the weight so that the directive off the air a function goes to zero and this is our goal is to reach zero Now, in a realistic function, this faction will be Murti variable. So we cannot simply do a deliberate ive. We've always a variable. We have to use some single partial division. Portia derivation is in duration off the era, respect to one weight that we know what's the effect of one weight on the area and we will have to use change rules to calculate the partial derivatives. 12. Learning Rate: there's something called learning rate. When we updated the wit, I wrote the equation w minus that, will you? So this is not exactly how we do it. We have to have a factor so that it becomes w minus are far the Taliban. This Alfa is simply a fraction that we call learning rate. So instead, off updating the whole the Taliban, you did a fraction of the data. W let's take an example. This is our creation for updating the weights. They would use creditable U minus Allah for their terrible you. This is a learning rate. So example current weight is one point you so wait 1.2 minus other. The alpha here is 0.5 Through back propagation. We have calculated that we need to update this weight by 0.1 learning rate. So here the update will be new. Weight value is going to 1.2 minus zero points Your five alpha times 0.1, which is 1.2 minus 0.5 good to 1.195 So this is a new objected weight. Based on these values that advance algorithm can implement adaptive learning rate. So in simple algorithms, the learning rate is fixed in more complex aneurysm. They can adapt the learning rate based on calculations. So the red is important so that we don't overshoot and we don't take too long to find a minimum. So this is an example where we see we have a parabolic curve and we want to find the minimum. This is a case where the learning rate is too large. So instead of going step by step cowards the minimum and jumps on to the next side of the curve and miss completely is a minimum like this, it comes here instead of moving towards a minimum, it moves back there and then there. So it just keep zigzagging, meeting the lines. So this is when we have two large learning rate. It overshoots. Now this is a case where we have smaller learning rate. The only problem here is we have to do many iteration until we converge to a local minimum . So this one is taking too small steps that would set a minimum. He really reached that eventually, but it's taking too long to reach their So these are two problems when you have two large learning rate and do small learning rate 13. Initialization and normalization v2: now Wait. Initialization. It seems to be not related to training, but you can have an impact on training with initialization Are the weights in the neural network before training has been done on the network. But the research has found that initialization can affect the learning off a neural network . So we have to be careful. We've with initialization. If we initialize with zeros in, does it work? Because the weights value won't change. These also methods that we suggest to initialize our neural network. So if we initialized by random values, it works better done the zeros initialization. But we have problems. We've very new and very high wits. So this is better than zeros, but still not good. No, it is. We work. We've initialization gold Hetchy initialization, which is a distribution off the wit according to this formula. And we also have the Xavi initialization, which is quite similar to the way he initialization that these emitters to initialize our neural network so that we don't encounter problems with you and high weight and was who stagnant wits. That doesn't change. One note on normalization normalisation is regarding the value off the input. So in some cases in boot values can range greatly depending on our application. The best practice is to normalize our input. To hear a good rule on them is that the input variables should be small values. For example, in the range zero 21 So, for example, if the input values are meeting zero and 255 the image Carter with Valerie meeting zero and 255 we can simply normalize it by dividing by 225. So that input with now the 0 to 1 this is called normalization and normalisation can affect your neural network training. So that's why we keep it the things his range. 14. Section 4 - Install and Set up - Anaconda: in this section, we will look at Titan intensive the installation. Here's a plan for this section. First, we start by installing Anaconda. This is a bundle for several pipes and packages, so we will download so we will download and install Anaconda. There will create and activate on environment for tens of flu. Then we will install denser flu. There is an open Jupiter notebook were then open Jupiter notebook and use it and use it to test. With attention, flu has been installed. We would try a couple off fighting Commins and lossy. You will create your first neuro network. Anaconda is a bundle off some popular fight and packages, and it also comes with a package manager called Kanda. Anaconda includes most packages, if you will need in one install, so it's a very convenient method to get started with Titan intensive. First go on the website and I conducted come click on Download, so here you will find versions for Windows, Mac OS and the Knicks. Let's say you're using windows. You have to download the latest version, which is bite on 3.7 and install the graphical installer. Click on this. You'll download the file, which is 637 Big night. Once you have done no, this fine, you can double click on it. It would take you through the installation visit once Anaconda is installed good to Anaconda and open the power shell. If you type conducted these, it will. This all the packages that was installed these are all the packages that was installed Our spot of anaconda. If you type fightin, you will see that this problem is given. This means that pattern was installed already as part of the package. It's ready to be used. If I type A is going to three, be equal to five. Print A plus B to this is by done working controls and on back to the base pump. Next, you would normally create an environment for tensor flu and then use people to install tensorflow. You can create a number of environment, so that's create one. The common for creating an environment is as follows here, using condoms. We're creating an environment or tensorflow. This is a name we're giving it and we're using python 3.7. It's an anaconda environment. After creating this environment, we have to activate it. The way you will activate it is by typing. Activate on the name of the environment. Once this is done, we can proceed to install tens of food. So type this is a peep is working. So what you would do next is Pip. Install tens of to this version meter and this will install the dense off the package. 15. Use Jupyter Notebook: the ones we have installed. Glinda. We have created and activated the environment. Then we have installed fightin. We can go ahead on open Giambi notebook. You go to Anaconda, click on it will open a window in the browser. Here we click on New, a new book by Done three. This is a Jupiter notebook here with Type my Commons and click on Run. So if I type print, how did you find him? You can't run, so we see that fight done is working. Now this check. If dental flu is working first, we will import dental through import tens of as DF. If you see an austerity here, it means that it's working. Still working here. As you can see the austerity change into Number two, it means that this sign has been executed. Next, I'm going to check the version off dental food, a new TF duct and this going to school vision. I'm just going to school the room so you see that it's using the tents of the version 1.14 point one. If you don't have any error in importing the Maju Tensorflow and in checking the version, it means attention flu is working, so it means that you are ready to use the deep learning I Berries from Tensorflow. Let's take a few minutes and look around the Jupiter notebook. If you click on fire again, don't know that this off commands here as the file. If you file download that people. So it was great. It would create a fire with all the comments that have been entered in the Jupiter New book . 16. Basic Pythoning: in this section, we will go over some basic point and commands. Use a Jupiter notebooks to enter the command If I do A it is good to three. B is good to seven. Bring it to have a son here. Three. Do the variability. Seven sovereigns would be if I knew print with Brenda Variable value. I kind of do print A me so so you can see that it turns executed the addition and then printed I can assign another variable. Seems good to a me princey. I can do print full times. Okay, now this look. That's a very important concept in fightin the categories of data types or it's called a data container. These are the categories we have variable. So we have seen variable. See, for example, here is a variable is carrying one fixed value. Can you also use strings? For example, this is a variable D that has been assigned a stream. The next type of detail container is a list for these use square brackets, for example. My these is good too. So in the square bracket will put the values for example. One. You have three elements in the least. I'm going to print my least sprint my least the first element I can modify the sign on the line itself. Big run. So here my this ever met one is giving us the second element, and to it's keeping us the third element that was a variable, and this one is the least. Next we have a couple apple users, normal brackets. So that's great. Wonderful. So this is a couple. So as you can see, I have a number here in number here a stream. And here have deep, which is high there. If I print my tipple, it would give us the content of the staple. 151 55 Apple hangs here, which is hi there because the was assigned Hi there. The difference between a triple any lease. This can store medical values in an ordered index. 18 couples those multiple fixed values in a sequence. There's the values that, stored in a tipple, is fixed in a given sequence. The distinguishing feature off a couple is a The values are fix, so there is no way to change the values of it. Triple weren't these? The elements can be changed if you want to check the type of detail you can use the type function. For example, type D. These here. We're checking the Type D. The string it was assigned to an anti Jer. If you do type, it says it's an anti jer. So next let's check for my list if I knew my list. If I do type my list, it's is the type of this data is released on Type my Duffel. It's a double now. If you use curly brackets, assign a couple of values on if we check the type type of. If you see sciences set, this is under the type because set. So we have seen variables. This couple and now we have set sets, stores multiple unique values in an unloaded collection. Then we will look our dictionaries. Dictionaries store multiple another key and value pairs so dictionaries will use curly brackets. Let's say we want to make a dictionary called G. When you use go the brackets, assigning the key make to value about you be under Will you speed 200? If I do print G, it will show the content off the dictionary. Now let's check if it's actually a dictionary we do Type G on dicked is a dictionary we live at ease the The next thing we will look at is a conditional statement. For example, use if if, as if and else if I assigned a variable hatch to fall, then I want to check the value off Hatch and I was using if, as statement to bring in which range this value is, so I'll use if which is more than five. Cut him. Brent F is, which is bigger within five. As if which is this is on three. Brent Hatch is small. The them three as Sprint is between three and five. So we know had she is going to fall. You want to check the value we're using? An if statement we can't run. So you see that hatches between three and five. How to write a function in fightin? Let's say we want to write a function called em. It has input X. Why is he, um, Sprint X? Why, Z? We're simply adding the input arguments X y z, and print the value So we have defined as dysfunction. We have two executive it. So do executive Just use em and a person Some arguments. 345 So it should add them and print. So it's printing the value of the Anshun. The dysfunction doesn't have any return. That said, we want to add a return to dysfunction. I'm going to define it again. Here on return isn't of sunshine. I'm going to use medication X time, swine times it. But now it should return a value, which is medication off the three numbers. So n is M and it's good to em, which is a function I'm going to Parsons on values. So now we expect n to contain the return of our news off this medication. Let's check it. Who is doing the medication on assigning to end? 17. Helloworld Neural Network: The usual exercise for Heather World for deep learning is the amnesty data set, which is a data set consisting of 60,110 digits. The objective is to create a neural network model that would be able to classify Henry turn digits after it has been trained in this project, we won't go into too much details about this index is the point is to type the sin taxes and see whether you able to use tensorflow you to create a model and be able to see it work on Jupiter have just opened the Jupiter notebook again. So import denser through Steve. Then I'm going to get the data set and assign it to a value. So this sign we're assigning the feminist is variable. And then we're going to load the data into some other variables. I'm going to go the data feminist. No later. So the data we will go into these variables extra and one tree and X tests. Scientist. If you want to see the content of the variable, we're going to look at one index of extra in. So this is a detail you're dealing with in extra in. After that, we're going to divide the data by 255. This is called Normalization. They were going to create a model. This is a model we have defined. Next. I'm going to specify the competition parameters for this model. So we're specifying some parameters for this computation that will do now. Well, this will use more than that fit. So this is the training part. So if you click on run, you should start seeing trailing happening. So you see that loss is decreasing and accuracy is increasing on here, you will see iPAQ one out of five. This means that the training is done. Just have a look at the numbers. The loss started. We've this number and drop dying to this. A smaller number and the accuracy started resists value. And it reached the higher value, as expected at the end off the training 18. Section 5 - Tensorflow - Design Flow: in this section. We were talking about tens of food. We go through these libraries. We talk about the crosses and functions that are used in denser flu for deep learning. Tension was created by Google Brain Team and Tensorflow is an open source library for numerical computation and not scale machine learning. It's also a deep learning framework, and I was developers to create data photographs. Structures that describe have detained moves through a graph or a Siri's off processing nodes. Each note is a graph represents a mathematical operation, and each connection or edge between the nude and each connection or edge between the nodes is a multi dimensional detail ari for denser. It uses Peyton to provide a convenient front and a P I for building applications with the framework while executing those applications in high performance C plus plus. And, as you say, tensorflow users centers. Hence the name and what's it? Answer it. Answer is a mathematical object analogous to but more generally than effective, represented by an area of components that the functions of the coordinates of its peace. Next, we look at the tension through design flu that's a normal design for that we use in most off project. This is the most basic flu that we will use in tensorflow design for deep learning. First, we will get the data set and some and sometimes we have to do some pre processing. We load the data set. This is the first step to get the details it. Then we create a model. The model will be made up off the use a distance stage with specified early years. For this model, we will specify the details off each layer in this model on stage. Then we will do some single compiling it specifies. Have the model will be optimized. Then we will have the compiler stage. This is not really compilation which will be used during optimization in the fitting stage after competition will go to fit fitting stage wear taking the model created here We were the years and that was configured during competition and at this stage we are training the model for a specific number off a box. And after the training we will do evolution that we return a loss and accuracy of the model and then we can do prediction. Predict is about using a new data with the motor that was compiled. Predict is about using the model that was trained to make a prediction using some new input data. This is basically using the mood. So if you have a motor that was trained for image classification at this predict stage, if we have a new image and we want to use this model, we have to use predict we pulsar image through the motive and critic. Using the pretty function on this will give. This will give a label that will tell us whether that image waas, for example, a dog a cut, which can, and usually we would save the model once the motor is save. Next time, we can load the model again and train it or use prediction. Once the motor is saved, we can close the book, and next time you can come back and load the motive and training some more or use it to predict some values. So this is a basic So this is a basic tensorflow design through. Now let's look at each one of those steps in details 19. Loading a dataset: the first step in your design flu will be the loading off data set to some data. Sets are available in the care aside, Marie's as data sets. So TF that here s the data sets we provide data sets. For example, this is the meanest, which is a database is created by young McCune. This is a data set off Henry turn digits. So amnesty Data set is a data set off 60,100 digits. Were there labels they do. Though this data set, we use TF that care us. Their data set that I am NIST, which is a data set. We want that the data, the function to the data and we can assign it to a variable. Another example is the fashion am NIST. This is a deficit off pictures off OPerez and contains pictures of shirts, T shirts, shoes, hats, ex cetera to get his later said. We use this, that fashion and the school amnesty. That little data and this really return a couple off nem pie Airy's. So the detail that will be returned from dysfunctions off loading deter will be in the form of extreme white dream X test white test, which is a couple of numb pie arias that's open Jupiter notebook and try it. So I start with importing the tension through next. I'm going to assign and missed. Do the F that care us. The data sets that I'm NIST. This is ah, missed database. Next, I'm going to use the load data function. So reminisced that road in the school data and I will assign it to nem pie couples. So we're loading the data onto these areas. Run. So this chick, what value we have brink X drain, for example, the extreme we've index tree. This is a data contained in extra in three. This is an image off the 110 digit. We can look at this picture. So this is this is a color off the pixels in the data. We can look at the actual picture using marked a team To use my team, we have to import it. So you're importing matter trip the type plot as BLT I will use. But that image will use the first data in the extra in, which is the zero index? As you can see, it's a handwriting off the digit five. You can check for other indexes. This one is true. So we have 60,000 off those liter in the extreme now the sprint, the white drain. So why train for 3000? This is the image in the extra and 3000 for the White Train. 3000 nine to the data set consists off input. This is a import ex train and the a good nine. This is what we're going to use in our neural network to train it. We're going to input the image and the expected. Now let's do the same for fashion analyst, if I change. Is this to fashion honest, which is a different data set? I'm going to have a different data set here for amnesty and print the extreme for this new data set from. So that's a different data from the fashion analyst. Instead of the normal amnesty, which is 100 which is 100 digits now, we want to show the image from this data run run. So, as you can see now sees, a different picture is a picture of shirt and let's see what extreme gives. Six is the label, and this number is linked to the clothes off fashion item. So if we use this glass name, we are signed class names for the labors. If we lived exists, the first class name would be T shirts. Okay, so if we use this class name into our white train like this, do that again. Map this number to the close name and we drink. Run. You see that? This is a should. Let's see Another Dita. Let's see. You want to have the data index 1000. This one looks like a pair of trousers. Now let's see the label associated with it. As you can see, the banquet is trouser. If you go to attention for that dog, you will see the care as data sets. So here it would show the letter sets that are available. We have used the analyst. This is an honest data set that we use initially. So as you said, we conclude the data which is affection. It lose the amnesty data set and you have used the fashion am NIST that we have used the fashion and missed. We have alluded to detail, like exists on. We have used extreme X. Why tree and excess rightist. So you can go on this webpage and you got the different data sets that are available 20. Build a model with layers: after the data sets, we create the motive on the motive. No American sees off much with the years, and we have to specify each one off. Those three years first will use sequential. So this creates a stuck over the years to work, creating a stack of the years for a model. And the argument for this sequential is this. Over the years, to add to the mood in after we create the stacks off the years, we're going to add the rangers to the model, and we're going to look at a few of the the years that will use in discourse certain flattens. The input the year call dense is your regular density connected neural network that we've looked at at the beginning off this course, the the cold comfort duty used in convolution on your own networks. So all of these are part of the years close, so we have flatten, dense kind of two D users have drop out. Is this the cold drop out? It has a function of randomly setting a fraction rate of the input units 20 at each update during training time. This prevents a problem called over fitting so it will basically set some input values to zero. So it's a fraction would be like 0.3. For example, if we set it to 0.3, it was set 30% off the improved to zero. We also have the Max Pool to D. This performs Max boarding operation for Special Leader. This we will look at in more details in the section for conversational neural networks. We'll use these two guns two D on max Port to D degree of these kinds off networks. And this is also a you g ou gated recurrent unit. We look into details about these three years in the recurrent neural network section. This information also available on the intensive website. So if you go to care us the years against the older the years that are supported, for example, we have the guns to D, so it will show it as a close. We have close come to D, and he was sure the argument that it takes, for example, filters gurnal size on his example is our dense The year gives information. I'm out. Just you have to use it. And what are the arguments for it? Let's create a motive of some of the years we do. Motive is going through TF that care us. The mood is the sequence Shil in these brackets, we will put it over the years. So you beat exist the f that hear us that they use the fluffed in with the arguments obviously will take. So we're using a Faton the start the second of the year with use it dense There you says that you will be drop out and the fourth of the year will be under the dense the year. So these are the 40 years that we are adding to this model. Now this will take argument input shape for example 28 28 For the case off amnesty. We have 512 densely connected neurons and we have to enter the argument called activation, which is activation function TF that neural network that do we look at this activation function earlier. It's got ready. That's how we apply this activation function just dense. The year we have a dropout of 0.2, it means a 20% off. The input will be set to zero in Syria, and finally we have the dense again and the output of the feminist is 10 possibilities. So 0 to 9 here also like this one, we have to have an activation function and in this case, we're going to use soft max. So here we are creating a model. We have these the years so you can use model summary is a function to check our model. So you see that the years old listed yeah, certain dense dropout and dense gives the ship which is a size and dimension off the output from his early years. And you associate, is how maney perimeters are there in each one of the the use. The parameters are basically weights that define and the model these are variables have changed uring fitting and optimization off the model. Here we have used activation function TF that and ended really and soft Max. Let's go into the communication and see these in more details to go to the tents of the website and you go to TF that nn then we can go to radio. This will give us information about this activation function. And we also used soft max. It is the FDA and ended soft max So at this stage we have created a model. We have a detail and a model after that will run, compile. 21. Run Compile: we run compiler on this model comparative configures A model for training. This is how we come by the model. It's motive that compared and normally we will use the arguments Optimizer We will use an optimizer here a loss on metrics so the optimizer would be used enduring fitting So we have to choose which optimizer algorithm we want and lost will be used as the goal off optimization. We can tell the fitting algorithm which thus function to use during the fitting and for metrics here we can specify what values we want to money toe. So this one is optimizer. We can have a look at the recommendation to see what kind of optimizes are supported. We can go to this thing. Examples off itemizers would be RMS prop at them on a STD stochastic gradient descent. So here, if you go to care us, we go to optimizes We have this off optimizers, for example, at them the close on them and and some details about how it works. The arguments on the methods This is for stochastic gradient descent. When use optimizer compilers said that we're going to use US function, we can specify this at this stage. Two losses lost function here. These are all the dysfunctions that are supported. For example, we have mean squared zero that we look at before this is a lost function. We have means square. The Grasmick errors we have these entropy is these are the popular, thus functions. And we have to send me how metrics these are all the metrics that are supported Richard can specify in compile. So usually we use accuracy and this is a definition for accuracy. Calculates how often predictions much is labels gives an example here. If why is this list on why predict is is then the accuracy is 3/4 witches, 0.5. That means that three times after the fall, it has a prediction. Muchas the legal. So these are the optimizers lost function and metrics. So we have created a model here that's run compiled on it. So what we have to do is model that compile and we will have to specify the optimizer because to something los he could do some dysfunction on metrics. That's how the company is written. So that's if example for optimizer. Use them. Thus, I'll use sparse, categorical across and droopy on for metrics. I'm going to use accuracy, so we have run compiled on the model. So now if we have the data set ready, we can use this model, which is ready for fitting. So the next step would be fitting the motive. 22. Fit - Training the Network: the next step is fitting. Fit trains the model for a fix ID number of a box. They will use more that fit. Taking arguments x y e Books X is improved. Aita and the input data should be in one off the following forms by our A tens of food cancer dictionary did the set and sequence and the form of the target detail would be similar to the input data in the model. And we also specify the number off epochs to train the motor. If you go to the documentation from Cara's model, this is quite a long page that shows the different functions emoted So if we find fit, this will give us more details. We have x y we have a box as we saw gives all the user details off the arguments. Extra beauty exist Why it exists and it box treat can take all the arguments and so resistant here The same page would have information on compiler to the close model. These we will come. These will come under the class model. So we're doing model that fit now This use murder that fit the last things we have run in our Jupiter notebook is motive Compiler. That was on the motor that we created here and independent to these motors. We have a data set which we all did here. Now we can run more than more than that fit. So it takes arguments extra in. Why dream? So this is X Y way. Have showing on the side and we want to train for five box. We simply safe it box eager to five and run. So that's one a pop down 60,000 samples. There's the second book and this is 1/5 and last park. This is a training that has been going on. You see that? The loss is decreasing. That's good on. The accuracy is increasing. So that's our training done. And this was run. We've motor that fit. 23. Evaluate and Predict: So after the training is done, we can evaluate a D model and model that evaluate returns the lost value and metrics values for the model in test mode. Computation is done in batches. During it, we use model that evaluate X Y and X is improved data, and one is the target Dita on the same page for model in care us. We have a section on evaluate. This is evaluate. You see that it takes X and Y, and the details of the arguments are explained here. As you can see here is we return tested off Duran's is another model was simply right model that evaluate it's realistic ex dist. And why just These are the best dancers from the data set run. So it worked on 10,000 samples, and it found that the loss is this value 0.6 and this is accuracy. The next thing we will do is predict this generates output predictions for the input samples. We simply write models that predict X we give X, as in, for example, and return In empire. A off predictions From the model page, you will see a predict section. It gives the argument you can read about the argument X What type of data should the input samples me, and then it returned them by Ari's or prediction. 24. Section 6 - Convolutional Networks - Intro: this section is on convolution, ALS, neural networks. We look into the details of this kind off neural networks. As we said before in this course, we're going to look at four times of neural networks, the 1st 1 being feet forward. This is the one we've already covered since Section two. We went into quite a lot of details about this kind of neuro networks. Z is sometimes called the venue on your own. It works because it's the most basic kind off your network. Next he's convolution of This is what will cover in this section after this section will cover the recurrent neural networks and finally would look at generative adversarial networks. These are the four networks that are most popular development, and we start with these four networks. So so that's look at conventional. Suppose we have an image. We input the image into the network. The first problem off using an image in the feet for war is up. Each pixel will be an input. So imagine we have 100 big sell here. Any 100 so that makes 10,000 picks is so you have to input these 10,000 pictures into 10,000 input notes, and that will lead to a great number off a new runs to train in an image their relation between nearby pixels. For example, if we look at this picture on top, will see, like, arrange batches. It wasn't this space here. How the same color if you got about community, the space here we have the same kind off Gaza, the same arrange. He also is so ideally, you should take advantage of this kind off relationship between nearby pixels. This is where conventional neural networks drums handy. If you look at this drawing here, this is the conversational neural net, and it's made up of two parts. One is a feature learning, and one is a castigation. The future learning part consists of several filters. This is where the image information oh, extracted and then passed into a normal feet forward. Neuro network for gasification. This type of neural networks was inspired by the visual cortex off the animal and human green. So our visual cortex in the brain that evolved to deal we've castigation of imagers imagine eyes is this is a very effective way off according images and understanding the features in an image. Confirmation on your own networks work, whether we data that has special relationship. The special relationship means one pixel is rated. Do the one nearby. It's a relationship based on the space, which is the two D space of an image. This type of network is able to successfully capture the special dependencies so it captures, so it captures patterns. Dependencies between pixels. The conversation on your own network is able to do that through the application of real events. Filters. Usually the deal we have to the images but can also work with one D or three D majors. Or even would you? So we can view convolution on neural networks as no mother feet forward. Thank you, plus special filters that exploit and extract special information. Where are conversational neuron? Let's use use, image and video analysis so you can analyze an image or a video and extract the information extract what's happening in the image or in the video. It can be used for image and video gasification. We can use this kind of networks to classify majors overdue. You can also work with natural language arises these types of networks consists off the years Africa will convolution of the years. These are the conclusion of the use. These layers, we reduce amount of neurons. Saturn needed these in with it these three years with the image and thus reducing the amount of neurons that are needed. You will see here this block C is convolution. Plus, really, this is the learning apart to the convolution filtering. Plus the nominee affection makes up the conversational the year. So after that we get pudding. So pulling with the reduce the size of the detail and you will see that these two units would be repeated a couple of times convolution plus radio pulling evolution plus radio pulling. So that means that you can have four or five stages of these until we reach spot, which is for the castigation. This is a normal feet forward, freely connected the year. This is useful classification. So this is what the conversion of neuro network consists off. This is a year is called Sutton. What it does is take the two D or free the information that reach here and spread it into a one dimensional ari of detail. So I would explain these two concepts convolution and putting in details now, so the rest are quite easy to understand. We have input. We have value, which we look at before. It's a nonlinear function. We have flattening explain here. And this is our normal for the connected the year And at the end here we have soft max, which is an activation function that we look up in Section two. So now we will look at convolution and then putting. 25. Convolution: Let's take a look. Convolution. The convolution consists of using a filter we call Colonel. For example, here we're using a colonel off size three by free. So it's a filter. We've numbers in it. I'm not included in your number. Here we explain how the filter moves along the image to create the convolution. So we have a filter free by free. We have our image, which is five by five conclusion with Apply this filter and then move this fitter in the horizontal axis and also in the vertical axis. The basically it will move anywhere you can from here to here. So imagine way. Have this free by free moving along this side and this side. Here is an illustration of the signing filter into find my five picture. So first starts on the left three by free. And then it moves 17 in this direction. And then finally it reached the end. And it will also move in the vertical direction. So convolution will involve some single duck product. We'll do a duck product off this fitter and the nine pick seven on the image. And this will give on the one value for each one off the signing give one value so we have 123 free values and then we have 123 free by free will be nine. It was signed exists until it has nine values. You will end up having a three by free image, which is a convolution off the original image. Now they strike to understand what's a product. I would give you some values to get a straight the product. So basically, I'm going to explain to you what these are doing on top off the Image X. We look at the product and one more thing is there's some single stride Stride is go to one in this case. So it's tried is how many picture it moves in any direction. In this case, we're moving by one pixel. So that's why stride is one. If we're moving by two. So is this one. This one jumps to here. It means that we have a stride of to so that product. Let's say we have our image like this. These are values of the pixel. What we're doing here easy the product of this colonel or fitter on this image So that stick an example, when we place the filter on the top left gonna of this image This is how the product we work zero with multiply by this one one we met by my 30 zero by 90. So that's why you have zero times 20 plus one times 30 0 times 90. So we're just multiplying each one by the corresponding values and the next 91 times 22 minus four times 55. One times 80 zero times 23 one times 27 0 times 50. And then we add all these products together. And this is what a that product is doing. Just placing this. This is a product by just placing this three by three matrix on that one. And then we get one value for this. So the value of the product with B minus 61 So this value will be minus 61. Then we have to apply this fit into the next three by free, so I'm not showing it. But But what would happen is zero. We much by my 31 by 90 Israel by 100 so on until we get another value for this, we continue siding the filter on the right side. Get the value here, remove the phyto down by one. We knew that that product and then we get the value here and so on until we get the involved image. So this image will be converted into this image. This method is used to extract important information so that we can recognize the image instead of looking at the whole picture. We are filtering the pixel according to the most important features this user effectiveness off revolution on your own nets. We've majors. Here is an example where we have an original image, we apply this Colonel and scandal has these values. What will happen is we're going to do a convolution with this colonel of this image. To get that one he was, is that features such as engines has been extracted. It's clearer where the edges are. So instead of the network working with each picks other, now it's working. Leave the most important features, which are the edges. We have many types of convolution kernels is an example what the different filters will do . So this kind of fit when it gives us we have different types of edge detections movies we have also like blurring 26. Pooling: after revolution will do pudding. And in this case, with the exploding so maximum pulling off two by two matrix. This is the image we values 23 45 etcetera. If you want to apply a max putting on it two by two we said like two by two big service and we choose the maximum on these full. So what is the maximum of this fall? It's 100. So we continue doing this for the next two by two. Among the four, which one is the maximum? It's 66. Same for these. For these four, the maximum is 97 the four here, the maximum is 117 And this gives new metrics. We've on these a maximum values. This is also a processing, a filtering or on including where we are reducing the amount of information to the most important information so that we have this data to handle this perimeters to work with. Originally, there was for my forms of 16 different values and after the max pudding filtering, we have only to my damages for the amount of data has been reduced by four times. So we're going to reduce the number of values that we need to work with. As we said before this is done, you're already off to a convolution, so convolution can lead to negative numbers. So when we pass it through a transfer affection, which is nothing yet, such as, really, we get on the pristine numbers which are more realistic to work with. 27. Convolutional Network Project - Part 1: The next project that will do is design a conventional neural network using dental flu and will use the data sit, which is our fashion amnesty. We've already news a feed forward Euronet ful Fashion M nus. In this one, we're going to use conversion on Neural Net for the classification. This is a design flu that will use first we load the data set fashion amnesty. Then we create a model. With these three years, you will see that there are a reflection off conventional layers to de max pudding and drop out. This free would be repeated and did reserve flatten the U. And then it goes to a dense, which is a feed forward neural network and then a drop out on dense. The job is used for preventing over fitting the way the drop artworks is to set a fraction of the improved to zero. In this case, we're using 30% so it's setting 30% off the improved to zero here, 50%. After that, we're going to rent compiler. We're going to use a dysfunction sparse, categorical across entropy, optimizer at them and metrics accuracy. Then we're going to fit. We've batch size 64. We're going to evaluate on Pretty Okay, That's billets Project on Jupiter Notebook. First I import tensorflow I was going to import and I will import numb by empty. Then I get the data for fashion and missed. I'm going to throw them apples is going to a fashion just from there. That would data this will. All the data in these variables on the train images would be the majors. This would be label, which is a number desi Majors, will also be imagers, and test labels will be numbers. So this is a training set, and this is the evaluation set. These would be close names for the neighbors. Let's say a particular train label is one. It means that the class name is trouser. So let me give you an example. Let's say we want to print train labels. Nine. It's five we're going to do is go close name and put five. So is this fight. So Trains labels. Nine. The Index nine has value five, and this points to a sandal. 012345 And if you check the image that correspond to this training label on this image will be train images, nine train labels nine and tree and images. Nine should correspond and it's the trainable is saying sundials. Let's check if the picture looks likely. Sundials. I'm going to use plt that I am sure this is a picture Drink Images nine on the train label nine is a sandal. These are the training data set that were used to train the neuro network. Next, I'm going to normalize the training data. This is because the variables with very between zero and 255 so we have to minimize it to make it become beating zero and one. So each one of the value would be divided by 255 just where you are normalising the details . Next, we have to reshape the data. So here we are, reshaping the training in majors, reshape to these values and desk majors the same. After that, I can start building my motive. So I used yodel. He got two TF that you hear us sequential. So this is creating a stuck over the years and it says that I can add to the one by one. So we do. Mordor that and and then I'm going to add to the year the f g Ross that they years the gun . Today. The pattern would be convolution along the years for the by Max putting the years The conversation to see the years will take some arguments, so I'm just writing the syntax first. Then, at the argument off the Max polling, I will have a drop out of the year. This series will be repeated three times. After that, we have the certainly year. After that, we'll have the dance of the year. We have a dropout. After that, we have the art put the year and the opportunity. It would be a density you. This is where we decode the, uh, put as using self Max activation function. 28. Convolutional Network Project - Part 2: So what are we going to enter us? Arguments in the convolution of the year. So we use filters 64 going to the size. That's the size of a filter to bite you budding seem we're going to use really activation, and we're going to specify an imperative. This is what's expected by this year, and that's why we have reshaped the images that this conversation of the year will accept the image as this size for the max pulling that you're going to set the full size you could do to means to buy to Max moving for the drop out, we're going to have 30% to 0 point free. That's a fraction of input of your sending 20 He also born free same born free on despondent Dick 0.5 after I think the arguments to each one over the year, that's how the adding over the years we look, I run it. I can check the model summary does. This is a motor that we have created. It has this tree repeated three times. We have a flatten, dense dropout and dense for the upward. Next, we're going to run compiler, yet we're going to specify a loss function and optimizer metrics. We're going to just use a career See here. But the optimism going to use Adam Optimizer for the last function. I'm going to use sports categorical cross entropy. So we have a murder. So we have a model and we have our data set. Really? Next thing we have to Dewey's fitting, what does it we're going to use training images and training leaders. We're much size 64 and number of the box. You can train it like five books. Hey, I'm just going to send it to do so. Have trended for two epochs and ocracy is near the 80% which is too bad. Now let's evaluate the model. Leave the training with the test data. Remember, we have training data. We have tested data. We're going to use test images and destiny bills run evaluation. So So you go through the 10,000 samples found out. The last is 0.4 and the accuracy is 83%. That's good. Next we're going to you next. We're going to do more than predicted. Next. Next, We're going to use predictions. Next, we're going to use motor that predict. So in this case, we're using Desi majors and we want the motor to predict What are the labels for these deaths? Images. You can go through the whole bunch by just doing Mordor that pretty deaths in majors. And this will return the prediction for each one of the best images. So I'm going to assign it to a variable called predictions. So each one of the 10,000 deaths images will be predicted. For example, the prediction for the 1st 1 this is Ah, Izumi is on put off the hole in your network on its an Ari off 10 elements. So this is our this is our of soft max. So to find the prediction, to find the actual prediction to find the labors, that so this is Ari. This is our put off the neural network after applying the soft marks on the output. So now we have to find out which one of the index has the highest probability. The way we do this is using Max. This is part of the new empire, and I'm very hug Max Predictions zero. What that does is to find out the index off the maximum number in this area. So it says nine. So this is the highest number. And do you know you know which you know, which I don't. This corresponds to We're going to use this class name that we define over there. So we do. Girls names. Nine. So the labour is pointing to the cost name, ankle boot. So what we have done here is using the best image. Zero. I predict this value with, and this is a prediction of the network ankle boot. So let's check if the test image corresponding to zero is actually an ankle boot. So here we have. So what? What? Using the close names. We know that it's an ankle boot. Now we want to check the image. I'm going to go to date again on name it. Original dis images. So if we do but image sure, original, just images zero is is Actually I am sure. So this was the image for destiny Age zero. And it said it's an ankle. Good, too. That's correct. Now let me create a little function just for fun. Going to make a function is that we're going to call this function. What is this image and I'm going to pass some test image to it. So what do we do? Is predict using the model that we have on the desk image here, but in this case, I have to reshape it. In this case, I have to reshape it so that you can go for the model. This is what we've done before. And then I'm going to calculate an index which may number ogg marks. Prediction. So it So this was this. We'll find out the prediction. This will find out that index in the prediction. And then finally with print. Okay, we think it is a I'm going to go. This is concatenation. I'm cutting it. Confronted ating. This is concatenation Comcast innate. We've girls names Index. The index fund here will be used to find the cost name. Okay, We think the feeling picture he's this easy and followed by the cost them. And then I'm going to show the picture. Since I'm going to use the original in majors, I have to normalize it here. So I'm going to use these original Desi majors passage through dysfunction and it should show and tell us what it thinks. It is. So what is? Is this image or Gino Test imagers? That's a 45 40 the 46 index in that data set. There's a mistake here. We re posted the original estimate 45 our through our network, and it's and it's saying Okay, we think the feeling picture is an ankle boot. Let's try and is the one. Let's try another one, 100. So now it's showing address. Okay, we think the feeling picture is address, so it's working nicely. 29. Intro to Section 7 - Recurrent Neural Network: in this section, we look at the recurrent neural networks, the types of neural networks that we've looked at the feet fel one theglobe evolution ALS, the convolution of the neural networks. Next into these, we look up next in the least, we look at the recurrent neural networks, and after that we look at Jerry TV Adversarial networks. Let's try to understand what or re correct neural networks take the case off a video. A video consists off multiple frames in sequence. Now, if we had to pass the sequence off frames in your own network, suppose we trained is this network. We'll get a label, so this image will be an identifying so this we can do. We've a network such as a revolution on your own network, so this frame is giving that's legal. The next frame individual will give unassailable, so the labour will be based on the content of the image. So whatever is in the image will be identifying, as is able. The next frame in the video image to give is able to image free. I wouldn't give a table for you. That's a player is playing football, and we want to identify at which frames are the goals in the match. Let's say these images image one is I didn't find are simply running. Image to was identified as if they kicking a ball and they both free, was able in the net. So if we as human, we look at the sequence we see the years running, he kicks the ball and the ball is in the net will understand that this reframes is a series of actions taken by the player to score the goal so we can understand the story. What's happening in the sequence, but in a normal neural network, just taking a look at each image Separate e. It's not taking into consideration what was detected earlier. So when it sees the boy Internet, it doesn't know that the previous label waas a player kicking a ball. So in a normal neuro network, we can actually detect such kind off sequence. So a normal a normal neural network doesn't have a notion off sequence or temporal information information about time it just input and output separate the independency we use to solve this problem. We use re correct neural networks. These are simply a feedback loop in the network itself. So in the hidden they use, we're going to take some information. These are not our put off input their hidden there in the hidden. The years these off. These are fed back in another year to make a new so they are fed back in the next sequence . So, for example, is that if I feed image one in this network, I would get label 13 years running. And part of this internal information is passed back here. So when image to comes in the input the year, it will have part of the information from the table one. So the network would be able to make a decision these on they won and able to. So it has the kind of memory for past sequences in this way is this kind of network is able to understand what's happening in this sequence so we can use this kind off neural networks for video to identify what's happening in the video based on the sea. Based on the sequence of frames that was identified, Where are recurrent neural networks used? They're used in speech recognition because, oh, Jew and speech have sequence. So in speech we have a sequence of different sums or for names. This kind of networks is perfect for speech recognition, then greets modeling, translation, image captioning, sentiment, classification stuff, Prediction on music generation. These article uses off recurrent neural networks. They work well because all of these have temporal dependencies require classifications based on temporal information. So these are sequence modeling. So why do we use or in in it's to maintain and order in the sequence without any feedback in a neuro network? We can understand the older in the sequence. It also Shia perimeters across sequence. Some of the information would be fed back in the next cycle. Some of the information in the some of the internal information will be passed on to the next duration. In this way, we are sharing perimeters you along the sequence and also keep struck off temporal dependencies we talk about depends and we talk about dependencies in a later slide 30. Structure of RNNs: This is a usual diagram for a or in in. So here we have our re correct neuro network. We have input vector coming here and an output vector. And here in sticking a signal that we call Internal State, we're looping it back as an input to the or in in effective Z. The input to the Oran in is Input Vector and the previous internal state. This is a correct internal state. We get the input vector going through metrics of wits and we have the previous state going through another matrix of weights. And these two together would give a vector and this vector with post fruit a nun. The affection and the function that we use foreign in is the hyperbolic tangent, the done hatch and this will give the current state and put off this function would be the current state. This correct state information is also passed through. A matrix of weights will generate the output. So we have our own vector here and she's a current state and the current state will loop back to the previous state input here for the next cycle. Let's take a look at the diagram. We've notes and interconnections. This diagram shows in your own network that looks like a feed forward neural network because it hasn't brute going through the weights and activation function. And we get enough put here. The difference that we're seeing here is some single previous state. That's not an input like these ones. They are fed back from the output of the activation function here. So this is what we have in a re correct neuro network will feedback the information the output of the activation function into the previous states. The U exists. So each one off those activation function will feed back to the previous state and in the next cycle, the input a za previous state input good through the network. This is how a basic re correct neural network will look like. These networks are usually unfolded into a sequence. So in this case, we have one recurrent neural network on the way we can represent. It is by a succession off the same network. We do this unfolding because it's easier to analyze the network. The looping can cause a bit of confusion, so that's why we unfold it. And this is just for another disease It's not really what's happening in this network, so we have. In this case we have just one network, but we're showing it as a Siri's off that work. So if you look at its networking, have input going fruit and there's an output. There's also the internal state information I will pass on to the next unit, but instagram, it's considered as indifferent unit Next improvement me X one. They also take the state information from the previous unit, and the same will happen until the last one. So the then off sequence is arbitrary. We can take the full length off a cheetah or we can split it in smaller length. So when they are put together, that's what they will look like. Leave the internal structure. This time Hatch here is activation function that with producer are put as well as a current state 31. Examples of Recurrent Neural Networks: Let's look at the example off sentiment gasification. So sentiment castigation is we're going to use it, Dex. We have several words, and we want the neural network to see whether the sentiment was happy. So whether the right there was happy, we've something over there, they were unhappy. So what we do is parts of words here. Compute. He didn't state pass it to the next, and then we have the second Would he? Didn't state Third Word cackled, didn't state again and do the maximum, then have been reached. And based on that and based on the weights Onda activation function, we are put a number and sweet terrorists whether the sentiment was happy or unhappy. So we are feeding a drain off words here and your constant We're constantly feeding through this network. And do we decided to check the input and to determine whether this was a happy or unhappy text language modeling. So it's when we it's when we take a body of text Charles literary critics a book, Shakespeare, Tex or even Scriptures. Then bridge modeling is when we take a body of Tex and we feed it through the network. So we are feeding one word by one word in boot, and he's interstate going to the next until we reach the end. So what's what's doing here? Is predicting what's the next word? So in this case, we have dessert students open their so so it was. Check the probability it was check so it would have a have a this off words to work with. So it will decide what's the most probable next would, based on the training based on the training that they receive training involved. The training involves giving the neuro network a Siri's off sentences on the train, the weights that they can mimic the language in the text, then bridge transition. So here we're translating, I think, from German to English. In this case, we're passing the words as input. In this case, we're feeding the input with a series of words, and the other would be also a Siri's off words that, uh, translated words in this case, we're using the order off the words in the improved to find out. What's the probability? What's the highest probability off the translated word for innocence escaping the sequence of input in mine while deciding the wood It's different from a word. Two words. Transition. We don't get any idea of the sentence when we have context. Translation. Some words are translated in a different way. Some words are different. Some words are translated in different way, based on the phrase it's part off. So in this kind of set up, it's able to find the best transition based on the sequence. 32. Training RNNs: Now let's try to understand what's training in the recurrent neural networks. This is a training example where we want to Mordor language. So this means that we're going to pass in. We're going to use it, Tex and pass it to the in Brasilia, and we're going to tell the network that we want these target characters. In this case, we're using characters as input. In this example, we're training the model me the word Hello, Hatch e l. So we're going to take each one of the character and pass it to the input in the first step . We're going to use Hatch as input. And we were dead. The Euro network that the target character should be. It's because he is the next character off the hedge. So way use input. Hatch and Doug director he So is this information repulsed through the heat in the years through the weights and activation, he really go through an upwards of a year? The city of recording here says it's a character who is a discrepancy meeting the dog, a tractor and the output character. So this is how we're going to use this era and trade this motor information from the Hidden State is passed to the next hidden state and the next input with the and we're going to tell the network that we expect the target corrected me. L and same thing. Compute the output and then check the loss pass on to the next L l. And so, in this way, we're training the network we've is this sequence. We want a network to be able to generate this kind of language. The way that we train a recurrent neural network is similar to a normal fitful warden, your own network. So it also uses back propagation. But in this case, we call it back propagation fruit time. As we said, each one of these steps are not really different. Your networks, they're the same, your own network. But they are drawn in this way to make it easy to analyze his runoff. This is part of the sequence. That's why we use through time because we're taking into consideration all the steps, fruit time and checking. What's the loss? We're going to do a back propagation through the network, and we're going to do the same thing off a beating the wit as we've learned in FIFA Ward Neural Networks. In this diagram, it shows that it's taking all the are put and cat with the loss at one go. But this is not very practical. The best thing is to drank it the back propagation from time. Instead of using the whole sequence, we're going to use check of the sequence. This is his one chance of the sequence. So we're going to use a suspension to calculate a loss value. This value will be back, propagated through the network for the whole sequence here and did the weights accordingly . As we've learned before, back propagation uses partial derivatives on Shane Route Update weights. Now there is a problem with back propagation through time when used in ordinance. This problems are exploding and vanishing brilliant problems because of continuous modification of the same week. For in each of these units, there was the same ways because it's the same in your own network. What we have done is copy the neuron network so that we can and I was in better. Since back, relegation is going to go through, the neural network is going to use a chain rule, so there will be medication off the weeds through the sequence backward. If the derivatives are small, for example, 0.1, so we have 0.100 So when we met by these free numbers, we get 0.1 It gets smaller when we go backward exists. So this one, we have a very small ingredient. So this is what we call vanish ingredient. That's a major problem with foreign in. We're going to show What are the solutions for this problem? This is a diagram that choose a sentence We have Phone was broken because the casing there's something called dependencies. Some words will depend on a previous word. This is called a dependency. To understand this word in context, we need the words with which it's related to. So in this case, we have phone and casing. They're related. It says someone has written an email, and part of the image is phone was broken because of the casing. So is a dependency me to encasing and phone. But as you can see, it's not previous to it. But it's quite far in time, so that's what we call a long term dependency. So Brennan has a problem. We've long toe dependency because off vanish ingredients that we see in the last line. So the solution forces is to modify the ordinance and we have two types off medication. One is long short memory as S T. M and gated recurrent unit, which is another version off the or in in so to solve the problem off vanish ingredient with use Isar off these version off Brennan. They have been designed to tackle the problem off vanish. Ingredient is a comparison between the or in in the normal run in the STM and geo you. These two will solve the problem of vanish ingredients. 33. LSTM and GRU: Here's a comparison between the Oran in the normal run in the STM and geo you. So these two missiles, the problemo vanish ingredients. This one you have already seen. It has an occupation functions quite simple, staking their input, calculate in the state and upward. This current state information will be fed back for the next cycle. So this one, I can see it's a bit more complicated. It has a couple of more activation function, it house modification and shown, and it has more way won't go into too much details off these two. But we look at the basic concepts in these modification to solve the problem of long term dependencies. The STM users gates so it has in brigade our forget, forget gate and segregate. The value is the imprint. Gate controls extent to reach a new value for those in the cell. The forget get controls the extent to which a value remained in the cell and the upper gate , and the upper gate controls extent to which the value in the cell is used to compute the activation. The are productive ation of the STM, so it's controlling the information. Forget in print and are put. Theis, solve the problem off. Long term dependency is so we won't go into the fury. Why? We won't go into fury. Why is this Helps? We've long term dependencies. But this is the medication that they found that that they found to work well for serving the problem. We also have the gate and recurrent unit. This one will have a recent gait and an update gate. The updated gate helps the model to keep him. The update gate helps the model to determine how much of the past information needs to be passed along to the future. So this control the information and goes here. The recent gate is used from the murder to decide how much of the past information to forget. Next, we're going to do a project with Geo you and the project will be on language modeling where we're going to use Shakespeare text to train a network so that it will be able to generate similar kind of dicks off the training 34. Project on RNN: this project will use re correct neural network to do language modeling. It consists of two phases. Phase one would be training. So we're going to use it next and train in your network and face to be going to use the first word from this Tex, Feed it as input to the war in in and let it predict the next word and use the next word as input in the next cycle and using the previous prediction as in fruit anything. So it will create a new text that was originally used to train. The Brennan in our project will use Shakespeare texts to train the or in in which, in our case, we use room you as the first word, feed it through the or in in and the degenerate 1000 words are steaks and we will look at the result. We already look at this sign before instagram a choose had to as the training would. But in case we use will use Shakespeare attacks as input each that they would go through the Brennan and printing the next. And finally we get a new texts that will mimic the origin text. This is a food project, So let's look at the different parts of the project. First, it's simply importing some libraries. Then we're going to prepare the data set. This sign is just getting the You are for the text, and here we're opening the text. We have different of the unique characters in the text and assign each character with a number. This is done here, so we do that because we cannot end a character in the neuro networks. But we have to do is a sign in number to a corrector on this include space is on paragraphs , the sections about Constant's. You're just declaring constants on some variables. These will be used in the lines for the After that, we have some functions we have. We have a build motor function. We have a loss faction, a Jerry Tex faction and a split into target function. You don't need to use functions, but it's a good programing practice. So Bill Model will be in the mood before we have just created the motor. Using Moder is going to TF. Cara's here defining a function. We have some perimeters. Some arguments, and these arguments will be used in the creation So if we look at designs, these will look familiar. We have sequential at the start. The FBI care off the sequential a couple over the years off the sequential We will have a few years. We have your gold embedding if the you call g. Ou. So this is our recurrent neural network. The we have before that g, o, u and STM or two kinds off recurrent neural networks. So you would have a activation function signaling it will have an initial Isar, which is gold exhort uniforms. This will initialize the or in in the year, and its function will return Emotive. So when we called build model with these parameters, we're going to return a motive. Thus will return the value of the loss. And it's using sparks categorical cross entropy as dysfunction. That's a function. Then we have Jerry text. So this is apartment. We're going to use the model. They're going to use a start string. So we're going to feed in a start string to the train model. They will take us argument the model on the start string, how we will get the modernity, start generating the text so the orange and will take the start string and predict the next character and use the next character as input and recycle until we generate 1000 characters . So this is a loop. It will generate a text generated and it will return the start string, plus the text generated. So here we are, building a text generated very little. We're using the murder to make prediction. This is a small function used for splitting in chunks. So after defining the functions, really go to the data set. These lines are preparing a proper data set. This is the line when we're going to call Bill Motive, which is a function we talked about here. Begin sense. Um, barometers are only meant this way. We're returning a model here as you get to combine and then we're going to fit the model for complaining as usual, you're going to run on computation would think optimizer and lost here with the the function off the compound as usual, with good to fit, begin to use a data set and e books. So here I'm just sitting into 10 and a box of set it to one. So when you do it, you have to use for example, four to get a good result. Then he steps Birdie book and the number that you would get here is 124. So that's that comes here when you run your codes, you have to use this number here. So this is variable that was computed there do 174 for the Shakespeare text. There is enough. Set it to 10 and epochs to one is because I want to have just a short run to check if there's any error and use a Variable Steps per e book instead of 10. After that, we're going to rebuild the model, and this is the line we're going to Jerry the text. We're using motive as input on. Also, we're going to use start string from you, Jerry texts we call dysfunction, sending the model on the start stream. It will generate 1000 characters, good food soup and return the text. To run the project. You can use Jupiter Book import the memories. After that, we're going to prepare the set, so feel free to use the print check variable on stage, for example, Print. Tough to find. What is this sign. If you read it, you will see that it's simply taking Is this techs from the Keros installation data set full under it has the Shakespeare, the text you who is getting the path? The text here is opening the file and get it as a text. If you print fixed, you will see the whole Shakespeare attacks is printing here. What is will come you just print This is order Unique characters in the text you see paragraph species, exclamation capital. It is small letters. If I knew print text us. It's all the characters in Shakespeare return us in number. So each letter, each character in Shakespeare he's muff to a number 18. Here is he's pointing to the letter in Shakespeare, Tex. Fort seven for the next letter. We do this because in your own network, visiting characters. That's why we have to map each character to a number. After that, we have some constants and variables that would be used in the codes. Then we have our functions. These are our function build model those jerry and spit input target. Then after that, we have some. We have some more. The preparation he's building the motive. So here we are using this bill Motive function. Now this is fitting. So, as I told you, you have to use steps, Bernie book here and nobody Boxing City to for for my David, I'm going to use the box one on this hunger to 32 10. So when I do this, it's going to optimize and train. The model is our 10 steps down. So we have used one book, and the sauce is quite bad is 4.9 so we will not get good reasons, but the showing You have to go through this and you read the text. But there's ever going to get the model ready for next generation if I'm going to use the Jerry text function and send the model in and the start string as removed. When we do that, we used the train foreign and model on Jared Tex. But it won't be that great because we're training well and the losses going high. This is a text you're getting room you and this was in Prince String, But you can see that it's quite gibberish. You can, and it doesn't look like a text like Texas. We come from into richer. It has all these random corrective, so we have to train it much longer to get better results. So I'm going now to run the whole thing again. But we've number of the box set to fall from 10 to step spoke book. 35. Section 8 - Intro to Generative Adversarial Networks: in this section, we will learn about generative adversarial networks. Generative Adversarial Networks was invented by Young Goodfellow in a paper published in 2014. It's quite a new architecture off neuro network, and it has been giving quite impressive result. It's generally used for generating data by making a distribution later, these networks has the ability to imitate the probability distribution off data. If we have a data set off majors, we can feed it through this network, and it will turn the probability distribution. It will also be able to generate data. We have similar distribution. Yeah, who is Facebook's AI research director? He said that this is a most interesting idea. In the last 10 years in machine learning, it has uses in image, it can germinate music, images, videos, speech and proves these are some of the uses off a generative adversarial networks. As I said, the results from Gun has been quite impressive. Here are some examples these are generated faces off celebrities, so what they've done is trained a network. We've database off images from celebrities and let the network generate its own celebrity faces. Based on the distribution it has learned from the detested. This is artwork greeted by guns. They have given a data set off images from different artists. These guns are able to you, Jerry artworks at how similar style ours the original leadership. This is a category off neural networks. This is a category off neural networks contrasted. We've discriminative can off networks. So that's contrast between discriminative on generative kind off your own networks. This creative in your networks has a goal off gas, fine data into labels It we learned from a data set and be able to classify newly majors based on the training that it has gone through while the purity of type of neural networks . Its purpose is to generate the based on the training from a little set. So this one is generating details. One is classifying leader. So it goes fire the data by turning the boundary beating classes. We have different classes off the DNC. We are distinguishing between cats and dogs. So dog is a class and cat is across. So it learns the boundary between these two. If you think about this space where perimeters are used to classify these two types of pictures, so to learn the boundary meters between these two spaces created by experimenters for generative networks, you know, able to generate these data by murdering the distribution of individual classes. So from a data set that we train the model, the German TV network will try to imitate the destruction off a close to say we have the same thing. We have a dog dark class. If you give a data set of images off dogs to the generative network, it was motive. The distribution off the class we call Doug it would be able to generate imagers that has similar distribution with what it was trained with in terms off their ability. We can understand. Screw it dif ours being the permit e off label based on features The features. All the data that's coming in is trying to predict the probability off disable or multiple labels. Fortunati. It's doing the reverse, trying to predict the features based on labels 36. How GANs work: So how does a charity adversarial networks work? So we understood the unity part. Why is the word adversarial in the name? It's because these types of networks actually consists off to separate networks that work independency. One network will work, as in detective and oneness. Work is going to work as a jury tha or a forger. What the forger is doing is trying to create data in this case. In this picture, the forger is an artist. He's trying to create artworks and fool the detective into believing that it's real. Later, the work off the detective is to discriminate whether the data coming in Israel or whether it's forged and the work off the forger or the jury is to create better and better data so that you can fool the detective. Nobody would have random numbers that comes in, and the jury will produce different types of data, and the forger will generate different types off artworks every time at each cycle. So you're going to alternate between Real Deter and 40 data. With time, the forger and the detective will get better at their job. That's why it's called adversarial. These two parts off the guns will be pitting against each other. The fortune will force detective to get better on detective will force the forger. You get better. This is an example where we're trying to generate data similar to M nist. So as you said, we have a generator and we have a discriminative. In this case, a discriminator is simply a convolution. Allow neural network is able to take in image off a digit Henry 10 digit on Say whether it's really all thick. Initially we train the discriminator. We've training set off digits from amnesty. Once it reached a reasonable level and be able to distinguish meeting the digits using random noise insertion to the generator, it would be able to generate new data. And this is in your own network. So what we're going to do is alternate between this training set and the fake image. We'll check if the discriminator knows whether it's real or think if your data is coming from the training set and discrimination this is it's fake. We have to train the discriminator. It said that data coming from the training set is fake, so we have to train this if we send an image from the generator. The output result from the discriminator is fed back to the jury. So after the jet has created the image, it will get the result whether it's doing good. Oh, it has to improve. So the director will be trained according to what the disc reenter is giving. So the output here is used both to train the discriminator and as a generator, based on whether the details coming from the training detests it over. That's coming from the generated image. Here is a different image for the same concept. We have really samples coming here. We have a generator and indiscriminate. The searcher means we are alternating between taking a re example and taking it generated, for example, the discriminate that we've given out put on. We have to check whether these correct and based on that, we're going to find shooing the training off its creator. And also train is the generator, and here we have the noise that's coming in to generate different kind off samples 37. Project on GAN: for this project. We're going to set up a gun. The majority adversarial network, which will consist off Do network separate networks will use two separate functions to create. These models will import the data set from Amnesty. The training set from M Niece will be used to train the miscreant and also to guide the generator into getting better. Genting fake majors based on the discrimination. This is the goods for that. Yeah. We start with libraries. We will get the data set from M Niece Low Data as we did before. Reviews on the tree and labels and training majors with normal eyes and reshape. This is done here. Here. We have a couple of constants on variables. We're assigning the data set to train data set. And here we have the affections. We have one function called Make Jerry to motive. This family is thought we've sequential and we actually use one by one. We have in dense we have. I think he really It's a year. We was kind of occupation function. We have conventional two d transpose. We have bucks normalization, Jared. And next week we'll have the make excrement model dysfunction will create the discriminate , they will return a model here. Same here. So this one will look like a typical con traditional neural network. In our case, we're classifying and this 100 digits so begin to have the years. We have a couple over the years of evolution of the years, Nicky really would drop out. Repeated twice. We have sucked in India. It's simple convolution of neural network. This could be the reverse off the convolution alone for the conclusion this motor instead, off gas, fine. It's going to generate in majors instead of taking in the majors on giving us a label is going to give us an image. So, as you can see here at the end, we have model our chief 28 28 which is a size off an M NUS image it would have is a four gonna do the transpose. It will have much normalizing. So this will create our model for Jerry. We have a function for Jared a loss. You will calculate the loss under generator. We have a discriminated lost faction that will calculate the loss on the other foot off this community. This is the last function for generator and lost function for this awful training on Jared and save images. We're going to generate that image and save them. There's no checkpoints. These are checkpoints for the model and going to display the image. Two runs project you can use Jupiter notebook first import the library's. After importing, the rivalries will get the amnesty data set little data. These are the gun stance on variables. We have a trend. It inset. This is our Jerry. To model is a function that make this morning. This is affection to create the discriminative motor. These are the functions to calculate the loss in the jury to model. And this one for computing the loss. Indians create the motive, these US infections to train the motors dream these functions will handle the majors generating on saving them. So now we're going to run this factions, make Jared a model and assign it to Jerry is a motive and make indiscriminate emotive on. We have discriminatory here. Now we have our two motors. We will use these optimizers for generator. Adam Optimizer. We've just bar meter. Same thing we would use for his creator Optimizer. This part will handle the checkpoints for the motive and we talked about the noise. Random noise for generator. These lines would start the training with this play, and images from the curator will be split here. So after each part, you will see a few samples off what the generator is able to produce, and you will see that it will get better. We've number of box as the functions are optimized the jury to get better and better. We're going to train it for 50 bucks and sweetie. Quite a lot of time. Rough. The 10 minutes book. But you would get in a date off the Incheon book.