Transcripts
1. Promo vid: are you interested in? Be able to create your own your networks? Do you want to solve real world problems with cutting edge technologies? Artificial intelligence? Your networks, deep learning, have been around for decades. What's important now is that we have access to large amounts of data on computational power . So now we can really make a difference in the world using neural networks. My name's John Harper on a machine learning engineer. A python programmer on an award winning entrepreneur, I was trained at Cambridge University on a recent scholar at the prestigious High School. In Right, This course lead you through the ins and outs or crazy, your first your networks. I take you through a little of foundational concepts in a very accessible and digestible way. I make sure I keep it practical when you create your own your networks from scratch. Using the cutting edge technology to create this course is for absolutely, I don't know, PhD maths. As always, you can add some numbers together and modifications you well on your way. There are no prerequisites this course, except for having some experience in python. If you don't have experience in Python, I recommend checking out my other course on YouTube, which has already thousands of happy students. So if you're interested in learning how to create your own your networks, apply them to your own data on use them on examples in the real world. Then I highly recommend you enroll in this course and join the other happy students. So if you're ready, I'll see you on the inside.
2. About the class: welcome to the insides of the ultimate new nets and deep learning masterclass in python. I'm very excited for you to get started in learning how to create your in your networks. The lectures are broken up into different types will be starting with a few presentations and slides, and they will be moving on to some real life classroom whiteboard stuff while leading through or the foundational concepts in your networks. I've made sure this course is accessible to everyone. So, like I said in the promotional video, you don't have to have any source of foundational understanding of algebra or the math mathematics behind it. I'll lead you through leading through all of it in a very accessible way. What is important is that you have some sort of handle using python, because in the first half of this course it's all about the foundations, and the concept won't move into the second half. We're going to be doing some live coding sessions where you're going to come along with me , and we're gonna be using python unassociated frameworks to create on your network. No, if you have no experience in Python is not a problem. It just may take you a bit longer than most people. If you haven't already checked out my other course on Python, I highly recommend you do, because that will cover all the foundations of python, also adding in the context of artificial intelligence. So once you've gone through the basic concepts in the first half of this course, like I say, will be moving on to using some awesome cutting edge technology in order to create some new networks that you could apply to real world data. In every lecture, there's a comments section where you can ask me questions they please feel free to go ahead on in the next few lectures were going to be going over how you can maximize your landing and getting started on some of these foundational concepts. So I hope you're ready. I know I'm looking forward to hearing what you think of the course, so best of luck, and I'll see you in the next one
3. How to maximise your learning: now, before we move on to going through all of the theory on your networks, I just wanted to touch very shortly on a really important point on That's how to maximize your landing during this course. Of course, I want you to be the absolute best when you get out of this course, able to code your own your networks from scratch, and I'm very confident. If you can make it through the 1st 5 lectures, then you're on your way. I'm quite experienced as an online teacher on I found in the past that once people get through the 1st 5 lectures, they tend to get into this good momentum, and they tend to pick up things a lot quicker. So if you can try to get through the 1st 5 lectures in the next few days, if you're able to try and keep all the lectures relatively short and digestible, so it should be doable. So here, just a few winning strategies for you as you go through this course to keep in mind. So first of all, what's actually really helpful is to have some kind of learning note pad, whether it's an actual note pad the of port. Or it's just some paper that you're able to write on whatever is get get pen, get paid some paper and have a next you when you're watching the lectures just so you can make some basic shorthand notes. This leads on toe trying to create this active memory where not only at the end of the day can you look through your notes and remember the things have gone through. But actually to try and get into this really good habit of first thing in the morning, when you're waking up on just before bed thinking through right, what have I learned about in your network so far? What can I What can I bring up from my memory that I remember eso This will get your active memory going and you'll be able to recall the fact better that really will put you ahead of the game. It's, I found, time after time. It really accelerates. People was learning when they do this active memory, where they try and recall the things they've learned before looking at their notes. So one really important point is code with me. So in the first few sections of this course will be going over the theory, So that will be a lot of some information for you to take in and digest. And there will be getting our hands dirty and doing lots of coating, using a number of different, very, very powerful frameworks to create on your networks. So, as I'm coding are highly recommends that we do so over half acting on your screen, where on 1/2 your screen you have the lecture on. In the other half, you have a window that there has has a program we're going to using Jupiter notebooks. So, for example, that has duped notebooks open that you can code in Just, you know, jobs. Notebook is a coding tool that you can use, so I always do. So. One really important part of this whole course is Aziz. We go through all the concepts, and it will be really helpful for you to just try and do as you go along. So I I bring up different opportunities. There are in different lectures where you have the chance to redo something, whether that's doing a few calculations drawing out on your network for yourself or the parts that you so far. Understand? Be proactive and always try and do things. Eso avoid just a passive learning where you're just listening to the lectures on, That's it. Try and take the information and do something with it. And as you build these skills and the ability to create or in your networks, I recommend you get creative. This is the fastest way to becoming an expert or becoming very good, at least in creating your networks. So as you get creative, you'll come up with your own ideas of how these companies can be implemented. I recommend you just go ahead and try them. You got to come up against different problems, different bugs and areas in your code. Maybe, but this will increase your skills for problem solving, which are really important. So these are my winning strategies, I reckon. If you If you for at least three out five of these on, do you make it through these 1st 5 lectures, then I'm very confident that you'll be well on your way to becoming a skilled in your network developer. So up in the next lecture will be talking about what is deep learning
4. What is deep learning: okay, So before we get too much into the underlying concepts and the maths behind deep learning, let's just take a moment to talk about what actually is deep learning conceptually on what of the definitions around it. So the first thing I've done is put together a few definitions off my own eso these Ah lot of different terms that you'll be hearing from myself on. And if you read any publications or online resource resource is out there, you'll hear these different definitions as well. So data science is a very large area of research. Andi used a lot in the industry, so my definition is the process of manipulating and visualising data. In order to see patterns in data and in brackets, I've put and take decisive action based on the output. So a lot of the time, for example, in research data science may sometimes just be used to see the patterns in the data, and that's it. But especially in industry, let's say for use Facebook as an example, they may use data science in looking at how people are behaving. How are certain demographic behaving on their platform on Facebook on Do you know what production's most likely to buy etcetera. So not only will they be looking at the data and seeing the patterns within, but they'll also be saying, Right, what action can we take? So basically, let's say that they see that women age from 22 to 24 like a certain product in particular. The action is going to be that they advertise that product mawr to this demographic. So a lot of people would say this is one of the more low level definitions where you can actually include a lot of the other definitions within eso. Next up machine learning It's a machine. Learning is the application of algorithms to detect patterns, usually from large amounts of data, using the patterns to provide an output from new data. So I just explained that quickly, essentially a machine learning. You have a great range of different algorithms that he used on different types of data. Depending on the size, the range, what you're looking Teoh to get from the data, what patterns you looking to see etcetera. So, like I say, usually this is used on a large amount of data with deep learning in your networks. It's the exact same thing. Usually you want to have a very large amount of data in order for the algorithms to detect real life patterns. Eso when I say here, in the definition, using the patterns to provide an output Um, let's say, for example, in the stock market, someone may use a machine learning algorithm to see patterns in how a certain stock is working, whether it's based on different environmental conditions, whether the stock is going Teoh, increase in price or decrease. And so you could put in new data as MAWR data comes in in real time. And it can output from this new data its predictions on whether the stock will increase or decrease in price. So for deep learning, my definition, my definition is that it's an important subset of machine learning, applying your networks to detect patterns in data and output. These patterns often applied on new out inputs, so exactly what I was just saying With machine learning eso Deep learning uses new networks on day. One example of using your networks is an image classification, for example, so you could train a deep learning network on Let's say I've got a 1,000,000 images of dogs and 1,000,000 images of cats. I can train the neural network to detect the patterns in the images, which would reflect whether it's a picture off a dog or a cat. And then when I'm saying often applied on new imports, that means that now there's been trained. We could use it in real life and give it new pictures of dogs and new pictures of cats. And it could be used to say, Whether is a dog or it is a cat. If we think of um, or real life application, we could use this in the in face detection. So let's say we we trained Ah, a deep a deep brain network on images of a certain person because we want our program to be able to take this to take this person's face where for they are. So let's talk about in your networks now, so this is really what's within deep learning. So my definition is a model that has learned patterns in data using repeated mathematical functions. We might describe them as notes, which you'll hear a lot of later in the course by applying different numbers in these functions and changing until it reflects the patterns in the data. Well, so what I'm saying here is really in your network. It's a lot of mathematical functions and the model itself or the program. It will put in different numbers to try and outputs a certain number. And if the number output isn't correct, then it needs to change the inner numbers. That's kind of the most basic way of explaining how a neural network works. But of course we'll be going into a lot more detail on this. Finally, what a lot of people use the term artificial intelligence, the application of machine learning, often deep learning that consult a novel problem without further. Since, I mean, this is, uh, there's a great deal of debate around what artificial intelligences on this is simply my definition. I think that when you apply a machine learning algorithm and then the program should be able Teoh solve novel problems by itself. So what I recommend with this light I'm gonna attach in the lecture as well is revisit this slide. Often, Aziz, you learn mawr throughout this court, of course, so that you can you start to get more of a context and understand these definitions on. Very importantly, I'd say Try and develop your own definitions as well as you go along. This will really help you to integrate what you've already there. And so where does that deep learning come from? Woods go through this really quickly. So deep learning date backs to 1965 when there was the first example. Often algorithm used in a feed forward in your network. So it was mainly theoretical. At this point. It wasn't exactly like we kids dio a dog or a cat image classification on. Then it got more popular. ASU went into 1971 people was introduced where on your network was shown toe work using eight layers. So that's eight layers deep. So instead of just having on your network now it we started to see that the we could use a number of layers in your network to make it more complex, be able Teoh Seymour complex patterns in the data that say on. So that's where we started to get this idea of deep learning from. And then, just as we reached 2000 deep learning was introduced into the machine learning community. Andi, it was starting to be accepted as another application of machine learning to detect patterns in data on. Then the legend Jeff Intern in 2006 showed the new your networks could continue to learn and improve themselves over time, and that was a really important point. So around this time, this is when we started to realize that there was a lot of data online that could be used to detect patterns and also the computation power in in a lot of the con computers being used in research. It starts to turn out that you on this on this great deal of data, it's time to find we actually now have the computation power to actually start using these concepts in a useful way. So how does this all connect to the brain? You know, whenever you hear in the media about artificial intelligence, you tend to hear that that rather than machine learning on, do you hear about in your networks? What does that actually mean? A lot of people get get confused. So how did how do algorithms How does maths and data connect with this idea of the brain and neurons I was gonna go through this really quickly, and it's my opinion that it's not actually that helpful to think about your networks, and deep learning in terms of the brain will touch in a bit. But really, what will be focusing on this course is the rial hard concepts, the real things that actually matter so about touching this quickly. So in the brain you have neurons and synapses, and based on the stimulate, your brain continues to create these connections so that when when it faces a certain stimuli, the correct neurons and synapses they fire off, you know they get activated, which is kind of indicated by the red color here on DSO in in your nets, and you get these layers where that say, for example, wedding in image, classification of a dog and a cat. You know you want certain neurons, which in this diagram here you can see that we have these notes, which all these we're going to see a lot more detail, any all of the circles and notes and one vertical light of them. That's a layer. So we would want, let's say, for example, between pics of dogs and cats. If it's a picture of a dog. We want certain nodes and layers to fire on def. It's not. We want them to not fire on. And, you know, in this context, away, these nodes and layers combined to produce one number, which basically dictates the program, whether it should fire or not. So that's that's kind of what that's the connection there. It's all about how the brain has these neurons and synapses that are activated under certain stimuli and with new or networks noting layers, they activate what they learned to activate the correct times. If there is the correct stimulate, for example, it's given an image of a dog. So Andrew Young, who was previously the chief scientist by do he, has described deep learning as the new electricity he, like. Many of us, believe that deep learning on artificial intelligence will have a huge impact on all of our lives in the next few decades on. So it's gonna it's gonna be a very useful skill tohave. So in the next lecture, we're gonna be talking about how already there are some incredible applications out there. Andi start to think about what new applications could be made with deep learning
5. Real world applications: So we've talked about the definitions behind a lot of the keywords in deep learning in your networks. Now we're going to get into some of the exciting stuff and talking about what the real world applications of deep learning and euro nets. I think this is really interesting lecture, and I hope you enjoy it. So first of all, just toe really hammer this in. I want you to realize that deep learning is all about patterns in data. No matter what the information might be, it might be. It might be an image. It might be audio. It might be getting data about the humidity in the air. In a certain place, it might be information about someone's behavior online. Whatever is all of this information is converted into simply numbers on put into data structures. And then we use deep learning to find patterns. So what I'm talking about the whole time is patterns. So, for example, his his one application in marketing, Let's say, for example, we have a retail store on. They're able to collect a lot of data on all the people who purchase things, Let's say, on their their online shop on day they. What they do is they look at all the data and they see what demographics are buying, what types of things. And let's say they find that I don't know. Middle aged guys tend to buy a set in a pair of trousers or certain shirts on. They're usually usually middle aged. Men come and shop on a Thursday, so they've gone through maybe tens of thousands, hundreds of thousands of shoppers and have looked through the data. And I've seen that on their online stores. Middle aged men tend to buy shirts on a Thursday. So what they could do with this information once they've extracted these one of these basic patterns is they continue right on Thursdays. When a user visits are cited. We don't know anything about them. If it's on a Thursday, it might be quite likely most likely that they are middle aged man. So let's put the shirts of the first thing they see. There's one application of using patterns and data for marketing, so behavior patterns really interesting. I think they're they're incredible application of deep learning. So let's think about this for a second online. That's let's think about a normal person. Let's say they go on to their their bank account. Andi. You can see that. Usually they withdraw certain amounts of money within a limit. They tend to spend Ah, a certain amount of reach day and on different on different certain areas. And then that you can see there's patterns in the way they spend their money, where they spend it, how they spend it, etcetera on DSO. Then if there's a new behavior that's completely different, and if you think about let's say, for example, a large bank, they would have a great deal of data or not only what's what's normal behavior in spending , but they also have so many create cases of fraud. They can put all those together, and they can start to get patterns, behavioral patterns from when it's fraught. So when these behavioral patterns come up there out of the normal on how much is being spent or withdrawn where it's being withdrawn from different places, they taken deep learning, can confident knees, look at these patterns and say this is highly likely that is gonna be fraught. Deep learning is now being used in medicine. So in our d n a for example, we have patterns off what we call base pairs different different proteins that are stuck together. And, for example, one very interesting startup in Cambridge. What they're doing is they're looking at for for cancer. They're looking at our DNA on there in their test subjects that say on there, looking at the DNA within cancer cells. Andi, How different treat And they can look it when someone's being given a treatment, they can look in how the DNA is being affected in the cancer cell itself, and they and they can much more quickly detect patterns and whether this treatment is likely to work for the individual or not. So with new networks, it's all about patterns, and it's all about learning. So with with your networks, their objective is to abstract patterns from data, and the way this works and practicality is the new or networks have to learn from the data . They have to learn these patterns, so your networks, they tend to them from two things. The first thing is there begin being given data. The second thing is they're big, giving some kind of direction. So, for example, with with the marketing the direction given is to cluster the different bits of data, so to show, to show us as the humans, to show us what you're different clusters of people of demographics that by different things at different times. If we're looking at something like image classification, for example, identifying in an image, there's a dog. The direction given is right in these. In this data, here's loads of pictures of dogs. I want you toe to detect the patterns in these pictures that have dogs and the ones that don't. So there are three main types of learning, and the 1st 1 is supervised, and that is most commonly that you have. The most applications out there at the moment are tend to be supervised in deep learning. So in supervised that's, for example, with the image classification with the dogs, where you where you give your your model labeled data. So, for example, if you wanted to create a program that could classify if there's a dog in the in an image or not, you'd give it loads of pictures with dogs in it, and you label them they have a dog in it, and then you'd provide it a lot of data as well. Maybe 50 50. So 50% have dogs and they give it 50% data that have no dogs in the images. And you have to label that those images, not dogs. And you give it the task of different gating between the two. With unsupervised you still providing your model labeled data? For example, In marketing, you're giving the data saying this is a middle aged man. They bought this this this on this day, etcetera. But there it doesn't need Teoh learn the model does need to learn anything. It just needs to output clusters or just just to see the patterns in the data. But it doesn't need Teoh integrate this data on necessarily on new data. So finally unsupervised learning. This is where you still give it labeled data. But you just give it a rules off the environment it in. So I'll give you one example. Now you may have heard that with chess, for example, now Google were able to create a deep learning over them or model that was able to beat the world champion in chess. So essentially, this is reinforcement learning being used. It was being given lows of data on games of chess. It was given rules so saying, This is how the pieces come, even chest This is help wins on the task was to look at this data off previous games. Play against itself may millions of times on, and each time it loses it, it basically needs to learn that that's not a good tactic, whatever it had just done. So I kept on learning until it could detect patterns in getting game behavior to make sure that it always won games. And so far, that's doing very well. So now we're gonna go over three different examples on you're gonna have a chance yourself toe try and work out whether it's supervised, unsupervised or reinforcement, any based on what was described here. So, first of all, language identification, let's say we're creating a model where a speaker, they say something into our program into a microphone, and the program is able to say what language that is so bored. Your it would say it's French, so I'll give you a few seconds. Not ever think. Do you think that this program would require supervised unsupervised or reinforcement learning? So it's supervised learning on why? Because in supervised learning your given labeled data. So what would happen here is if you probably some people may have thought unsupervised, but it's it's ah, it's not because it's not. It's not just, ah, win or lose or there's certain rules. What you need is just like with the example of dog classifying dogs in an image. You want to give a data that saying This is French, this is English, This is German on the task is to detect patterns, um in in what the person saying to say whether it's it's a certain language or not, say, for Labour's identification, it is supervised learning the case, the next up robotics movement. So let's say where we have a task where we've created a robot on. We want the robots Teoh successfully, let's say, pick up an apple So we give the robots and rules. We say there is an apple on this table. You win. If you have the apple in your hands, you lose. If you don't After 10 seconds, let's say and we say to the robot, Okay, now this is how you move your right arm. You can move up down left, right forwards backwards and and so that's that's your environment. Urine. So do you think this is going to be supervised unsupervised or reinforcement? So this one is reinforcement. Learning Why? Because with the robots, we're giving it a certain type of labeled data where we're saying it's win or lose on the win or lose Here is if you have the apple in your hand, you win. If not use. The rules are in the environment. You can move your arm this way. This way. Forwards. Backwards left, right, etcetera, Thirsty woman. We are looking at voting records. So in this one, we want Teoh be able to see some voting. Had her dickered. Maybe last year. We want to see what demographics who is voting for for which also it would applicant, I guess. Candidate There we go. So who's voting for what Candidates know what type of people are waiting for the different types of candidates, So I do think it should be unsupervised, supervised or reinforcement. This one, as you may have have thought of giving three examples, say I want to include every single one in their eyes, unsupervised learning why? Because we're giving unlabeled data of the people who are waiting. But what we want to do is show the clustering. So we want to show what type of people are voting for what pattern, and we don't need to need any other up apart from that. So in this course, the main focus is going to be on supervised learning, mainly because you have a great deal of applications, really interesting ones in supervised learning. I believe there's gonna be a huge push on this in the future as well. On day supervised learning allows us to cover all of the really important concepts to understand deep learning and your networks. So one more thing for you to think about I'll leave you with in this lecture is let's say, NASA wanted to explore a new planet. They wanted to program a spaceship that's able to do this journey by itself on land successfully on the moon on the planet. So what sort of learning do you think would be required for the spaceship to do this successfully on? Do you see any issues? I mean, how how could they get this data? How could they improve the deep laying network in order to successfully do this. So this is something just just for you to think about and add a bit more bit more to the lecture so you can think about the type of learning and also the difficulties that might arise trying to do it. So next up, we're gonna be getting right into the concept of actual new networks. Now, by talking about the basics of its Viniar aggression. So I see in the next lecture.
6. Linear regression: Welcome to the White Board. We're going to be going over a lot of the main concepts on this kind of screen so I can draw out all the things that I'm explained to you. So it's is clear as possible before we get started just to talk about what we were discussing on the last lecture with using a spaceship to reach another planet, that would be the best application would be reinforcement learning, giving it the rules of the environment to get their Andi. Obviously, the wind is to land successfully. Now the difficulty would be actually, how would you get enough data for the new network to learn sufficiently That would be likely to be successful in lending. So obviously, you could just send loads and loads of rockets to the mean. So what you do is modeling. You would model all of the different variables that are involved in the traveling and basically create. Theis set up where your program could essentially try it lots of times in this model Andi. Then when it seems like it's confident it's going and properly, then you could could implement it. So that's the answer to our last nothing so moving on to linear regression now, um, so let me just get there. It's already so at the heart of soup of supervised on reinforcement. Learning in using your net networks is Lynn your aggression? Linear regression is a simple way of detecting patterns in data. So in this section, we're going to start with the absolute basics. If you're already familiar with Viniar aggression on you, don't feel like you need any kind of refresher. You can probably skip at least half or the whole of this lecture, but it might be useful for you toe to go along with us on this lecture just as a refresher . So let's start off with two different variables. Let's start with X and Y X on Why? So you can Basically, when we talk about X, this is usually the input. Ah, and why is usually the output? So when we're talking about input, this is kind of like what we're talking about with images when we using that famous classification Andi, suffer that. Why is the output in terms off, For example, if we're doing a misclassification? The prediction. So if we use the example off cats and dogs what we're talking about here is the import, you giving it lots of images that are either pictures of dogs or not pictures of dogs on why the output would either be It is a dog or it isn't a dog. So we're gonna talk when we talk about the new aggression, we're going to use an even more basic example, and we're going to look at the graph, so hopefully you're already familiar with graphs. What we do is we'll have X on the horizontal and why on the vertical. So we're no longer talking about such a large thing is image classification will go talk about something much, much simpler. So when we say, Let's use an example like Facebook ads, let's say we are a small company on. Do we have a certain product and we want to? You marked it using Facebook cats, and we want to look at if we put in X. So let's say this is the amount of money we put into a Facebook ad campaign we want to see based on how much money we put in, Um, how much do we get out? So we could say put sales here on the Y axis. So we're looking out depending on how much money we put into our Facebook ad campaign, how many sales do we get out? And let's say each one of these squares is just $1 cancer. 1234 blew up. So let's say, um, in our first Facebook ad campaign, we put in $2 worth of cells. It turns out that if we just been $2 we actually gets $20 in sales. Starts looking pretty good. OK, so now let's look at what happens if we put $5 into into Facebook ad campaign. So we do that. We put £5 interface began campaign on. We see that we get an increase again. This time, let's say we get 50 dollars in sales and so we complete this $5 who have put in $50 out. So here we're putting our imports against our outputs. So our input is how much money do printer ad campaign doubt put is how many sales will be getting. So with these data points, we can start to see a basic person, right? We could assume based on just this data, if we were to put in $10. Well, they over here, let's say then we might expect to get 10 hundreds dollars in sales because it looked like from previously when we put in $2 we got 10 times back the amount in sales. When you put in $5 we got $5 about $50.10 times as much back in sales. So what we're doing here is we're extrapolating and we're saying we assume based on the previous patterns in these two bits of data, obviously you'd want a lot more data than this. But this is just just an example. So if we put in $10 we believe that we might get $100 out so we can assume that this could be a good pass in our data. So with our model here, what we could do to make this even easier is we could say what? So actually, we put in $0 we get there, right, so we could try to create a line of best fit. Okay, so excuse my wonky handwriting, but what what I'm doing here is I'm trying to draw a line between all of the points. The basically this line here. It represents the pattern that we're seeing, right? So we're saying that if we put in a certain amount, 10 times is going to come out and with this with this data, we just drew a line through it. So what we can do is we can say right. I want to be able to predict now what would happen if I put in $8. So I'd go to $8 which let's say, is about here. And we just keep going up on this here, which will be 80. This is actually a prediction now. So we've created a very basic model here that's detected a pattern and it's creates and the pattern is represented as a line of best fit, and it's able to output a prediction. So another formalized it, and we put it into a line of best fit, withdrawn its will in the next lecture, be looking at in any aggression, how this is used mawr in the real world, on what it's like when we add more data
7. Line of best fit: in the previous lecture, we looked at drawing a line of best fits. We can now look at this in a more formal way to make it clear how we can create basic models to predict new values. If you're familiar with in your aggression, you can probably skip this part that just makes you make note of the notation y equals W X Plus B, which will be using a lot in the future lectures on Just to make a point. Linear regression is kind of one of the backbones of neural networks, So this is why we're focusing on this. At the moment, we're looking at the real Basics house that we can scale this later into a neural network. So we basically want to say when we have a value off some X of some imports, what's why going to look like So what we were talking about In one of the previous lectures , we were saying that if if we have Facebook ads, if we put in as an ex, if you put in as X $5 then in sales, what we're going to get is $50 back. But let's say for let's say, for example, that actually our Facebook ads aren't doing as well as that. Every time we put in $5 into Facebook at, we get $5 out. And let's say after that we try putting in. I'm just being a common here. So we're saying in the next time we're using $6 and the mouth sales we get back. If six. Let's say now we try $10 with game bit frustrated saying, Right, let's try and put a bit more money and get this going. Fortunately, we're not losing any money, but we're also not making any money, so we're getting 10 $10 back in sales. So what we can do if we had this on the graph that we were looking at last time where X is on the horizontal axis and why is on the vertical axis we can plot these sets. This is five by five against 5676 and then 10 against 10 in this case is very easy to draw a line of best fit because they are all going. There are exactly this in the exact same pattern, so those reds of easy and overseas zero enduring and What we can say is we want to look at it when we put in an X what we have to change to this number, the inputs to get the output. Five. Well, in this example, it's very easy, and we can literally just put it in this way. Our prediction. This is what we're saying here, this is our pattern were saying The pattern is that when we put in, we get why will me input x simple as that y equals X. Let's try that with another example. Okay, so if X is 12 you were paying $12 into an ad campaign. Well, why is going to be cool to 12? So why was going to be Good X, which is 12? So that goes along with our pattern, and that is a prediction. Let's say we want to try again, but this time we're not going to be using Facebook ads because I have been working for us. Let's say we want to move on to Twitter. So we try again. We say, Right. Okay, let's put in $2 this time. And once we put in $2 into ad campaign, what we find is that we actually don't get just $2 back. We get $4 back. Awesome. Okay, so what happens now? Let's try and put in. $5 is going so well, that's put in a bit more. We've been $5. We find that we get $7 back in sales and we try a few more of these and we find that there is again very obvious passenger coming up. We put $100 we get back $102. Say, this time we're gonna have something that looks a little bit like this. Obviously, zero, this is going to be zero. But the rest of it is going to look a bit like this because what we have here, the pattern that we have is that when we put in X, why is going to be equal X? Can you see the pattern? Basically, you just need to add to say our expression is why equals X Plus two. So the equation for this line of best fit is why equals X plus to okay. And there's one more that I want to just talk about re quickly. And that's the if we put an X on this time. What we get what we find is that if we put in one, we get three. We put in three. We get seven when we put in hundreds, we yet 201. So this time it's a bit more complicated. Essentially, what's happening here is that where the pattern is that each time we're times in X by two and adding one to get the result. So this time, what it will be would look like is this would be again just a bit a bit higher, But also it would be a bit steeper because the equation will be, you know, just get why no energy in tow, absolutely tax. But you need to multiply X by two, and then you add one. So this is going to be the Grady int of this line. So the higher the number that we multiply x by the higher this Grady it will be or the steeper the line will be. So this number here that you multiply x by that effects the steepness or the Grady int off the line. This number here it affects the y intercept or effects how high or low the line is if if that zero, you'll go through a line like this. If it's minus one, who will go underneath down here? If it's plus 10 for example, it may be up here. So these are some of the four formal points in Viniar aggression, and this can be formalized in this generalized equation. Why equals W Times X plus be and you will be seeing this equation a lot in these lectures because this is really one of the really basic building blocks for linear regression. The W stands for weights and B stands for bias. Okay, so the weight effects the steepness of the care of the graph of the line of best fit on the bias effects of the Y intercept. Let's look now at what happens when we don't have data. That's so kind of us. What I mean by that? Well, you know, sometimes it's not going to be a beautiful straight line that fits all of the data perfectly. So I'm not going to go through the X and Y values this time, but I'm just gonna plot them so that it's more visual. But it is still the same thing that we're doing is always let's just say that were again doing an ad campaign. Okay, so this time, as you can see, we can't draw a line of best fits. That's going to go through all the points, a straight line. So one thing that we could do is we can say to ourselves, Okay, let's just try our best to draw a line through that. That looks like it's, you know, it's It's the closest possible toe, all of the different data points. So, for example, you may draw a straight line like this, so it kind of suits all of data points so that none of really being left out too much. There's a bit of distance between them, but this is the line of best. Fit is not called the line of perfect fit. So there's one other thing that we could do instead. Just drawing a straight line when we touched on this in the future lecture is we can draw a curve line like this, which has its pros and cons in real life. So when when the data isn't it doesn't have, like this perfect line of bets, best fit, then go through it we say that if if you have a line that does fit all the straight line, that fits all the data well, we can say that the data is highly correlated. If we have something like this, then we can say its has sort of a local correlation. If we have data points just all over the place where you know you can't draw any line of best fit that would make any kind of sense, then the data isn't correlated at all. When the data is incorporated a tool you will be very hard to detect patterns in the data itself. So that's it for talking about linear regression and the line of best fit in a more formal way. The most important thing to take away from this whole lecture is why equals W X Plus B. This is a formal representation of linear regression where W stands for weights and be Stansell biases. It helps us to formalize a line of best fit, which basically a line of best fit. As we've been saying, it's a representation in patterns in the data that allows us to create predictions are predictions a y. So in the next lecture will be looking at real world applications of linear regression
8. Linear regression with big data: In the last few lectures, we've looked into linear regression, how it works, how we can formalize it into a line of best fit and how to make predictions but so far have been using data. There's been quite nice, quite easy, to draw the lines of best fit and to make predictions from. But in the real world, data isn't so nice, and they usually aren't such clear patterns at all. They usually look a lot messier. There may be correlations in the data, but it's not so easy just to look at some numbers on Get Such a clear indication is y equals X or something like that. So let's say, for example, we have some real world data that looks a bit more like this. Obviously, if we're looking at proper data, there would be maybe a 1,000,000 of these points on dumbed. It wouldn't be so easy to draw by hand, but that's why we're staying at the basics. Here was just a few data points on then. We're scaling up to your networks, So if we were to try and create a line of best fit here, I guess what we could do from the looks of it, We could kind of create a line that looks like this, so it doesn't even cross any of the data lines. But it minimizes the distance of all the points between the data points and the line of best fit. So I mean, that's OK and you could play around with that. But it's quite limiting to just stick toe having a straight line going through. As a result, we may say that any predictions we were to make from this, for example, if X required to this point here, what would y be our prediction? We could say we wouldn't have all that much confidence in the prediction seeing Is these the points or so far from the line? We could also say that possibly this line of fit isn't very accurate. So as we discussed in a previous section, there's another thing we could do. Instead of having a straight line, we could make a curved line that fits all the points a lot better crowd like this, which, if you think about it from 1st 1st looking at it, that would be a perfect fit, so why wouldn't we do that? Well, in real in the real world. Actually, that's not so good, because we're doing something called over fitting the data. Um, so the way that we can make internal line from being just why equals w X plus B Because using just this equation, essentially, you know, it wouldn't actually output anything of you that could be a curved because it's linear. So one thing you could do is add a number here. So if so, is it so if said we're number were too, then this would be X squared. So that's one thing you could do, but in your networks, we don't do that on. I'm just going to give you a bit of a teaser here into actually what we do on bond. Essentially, once we've got why we then put it through something called an activation function. That's all I'm going to say for now. But what it does is it introduces a non linearity into the equation, which means that we can get these nice, curved lines on now. Obviously, when we've got line like this, too, to create just one equation that describes this whole thing, it will be a very complex equation indeed. So we wouldn't just try and do this by hand or by looking at it. And this is where we start to rely on computation and actually using MAWR than just one of these linear regressions with an activation function. So this is just a taste of to show you that in the real world, the data isn't as correlated as we like. If we have millions of data points is gonna be very hard for us to calculate what the line of best fit would be by hand. And actually, we don't want Teoh necessary to have a straight line, would like to have a curve line, but we also don't want it to ever fit. So maybe what might be better is a line that something like this where it does fit with data quite well. But it doesn't fit too well. If we introduce new data, it wouldn't work. It would over fit. So that's what we're gone over in this lecture here. Essentially, in the real world, we having curved lines we won't be using exponential by the said here, will be using activation functions. So that's it for now. In the next lecture, we're going to be discussing this issue of fitting the data. Well, we've gone over a lot so far on I hope you're feeling like you understand the concepts. If you're feeling restless to get started with actual getting into in your networks don't worry. That definitely is coming by really assure you that Yeah. One of the big point to this course is that if you follow along in the way that I'm doing so far giving you the building blocks, you're gonna be learning the the larger concepts, the more difficult concepts, a lot quicker. So stick with it. And when you ready, I'll see you in the next lecture.
9. Overfitting: So looking again these examples where you have a low correlation in the data I It's not just a lovely straight line where you draw one line and it fits all of the data points. Really? Well, sure. Weaken. Draw A nice curved line like this that fits all of the data points really nicely. So something that something that looks like this, then great on this data, the accuracy is 100% which makes us feel good. But what happens when we had new data? Let's say again, we're looking at using Facebook ads. So we're putting money into Facebook ads here on our X. I don't know why we're seeing what our sales are. As you can see, there's isn't much of a pattern to be seen here, but there is something. So it may appear that okay, with the data we've got, we fill it 100% Well, so Okay, that's fantastic. But what if, for example, we can now let's try and make a prediction. So this is still well, that's very curvy. In weird. It's still our line of best fit. So now let's say we put in this amount of dollars that say it's $500 here. This would be our prediction. But what if we added even more data? It turns out that there's one bit of data here, another belt here, things that in the in the small amount of data we started with, it just hadn't picked up that there were other possibilities. What would happen is that our line of best fit when we try out with real world data points , it falls apart, and the accuracy is actually very low on our predictions. So our prediction here using our line of best fit was this, however real life, it should have been up around here. So this is just how incorrect we were on. We don't want to be too incorrect with these points, of course. So the more data we provides, obviously, the line the line of best fit, as we've said in the past, the line of best fit simply showing the patterns underneath the data. So the mawr data we have, the more confident our pattern would be because we're fitting more data so we can be a bit more confident. But unless we have a lot of the possible data in the world. It's likely that, you know, if we fit the data like this perfectly on the points that we have, it's likely that were were over fitting the data. And if we had new data points that it might the ALS, the nuances off the data, all the patterns haven't been represented in the data that we have. So at this point, I would like to introduce you to two different things. Training and evaluation data sets on to explain a bit to you why we use thumb. So let's just get gapping cream pad here. So without training, we have a later date points just like this and essentially what we want to do the training is at the time where we run on your networks and we try and create a model. So for now, we can just say we're running linear regression. So we look at all the data points on what looks the data points, and it creates a line of best fit for our training data. Okay. And we want to. Basically, the aim of the training is to get a model that represents our data relatively well. So when I say it represents the data. Well, I mean that it minimizes Theo over a distance of all these data points from the line of Best Fit. Because if you think about it, if the line goes through all the data points the Nats representing the pattern in the training data really well if the distance if there's a lot of distance between all of the data points, let's say I draw a line here and all the distances are very high. Then I would say that it's not representing the data very well. So when we're talking about representing the data representing the patterns in the data were basically saying, We want the data points to be not too far away from our actual line of best fit training. And then we have evaluation data, and essentially, this is like are saying right, we've trained our model on some train data on that seem to be going well. Now let's test out our our model and see if it works on completely new data. So, for example, if we were using the image classification for for the Peace of Dogs would give it 100 pictures of a dog off dogs Andi were trained our model on that, and then we'd put in another. I don't know 10 or 20 new dogs and see if it is still able to identify dogs in new pictures , and it hasn't just learned to identify those particular dogs in the initial training data sets. So to recap over fitting is when you have data points in your training sets, I'm going to four for now, just to make it easy. So fitting is when the in your training data, your model or your line of best fit. If it's too well on. And when you come to put in your evaluation date sets the new points come out. Actually, you're the patterns are being represented well enough, and you're getting your you re not doing great predictions for new data that comes in new X values that come in. So under fitting is when we create something like this, or even worse like this where we just were just not fitting. The train dates too well on there's a great art, and there's a lot of good strategies that will go on to a later date that where we solve talk about OK, how did you get that good. That good mix of fitting the train data well enough, but not fitting it too well that you when you add new data, it just doesn't work. So I'm just introducing the concept here of over fitting and under fitting. They'll be very useful concept when we start to scale up in your networks for now, as long as you appreciate that. Over fitting is when you have training data and your model, it fits. Each of the data points perfectly, But when you add evaluation data or new data, the training has been fit too well that it can't make good predictions new data. So it's always you can appreciate that. Then we have accomplished the task for this lecture. So next we want to talk a bit more about accuracy. Andi. Confidence in our lines of best fit on to do that in a bit more of a formal way, which we will discuss in the next lecture. So I look forward to seeing you then
10. Cost and loss: so once going to bit more detail here in this lecture on how we describe how well a model is predicting data based on the training data on on the evaluation data. So in real life, we may have thousands of thousands or millions of pieces of data. First off, we need to train a model to find a line of best fit that fits this data. The training data. Well, to calculate how well the Linus fitting we can look at how far from our line of best bets all of our data points are. So I've got some data points here. And let's say, in our training model we come up with this line of best fit. It's OK. It may not be the best one, but just just for an example, First of all, we want to say how confident we are that are our line here. Line of best fit represents the data. So one way we can say is that the closer the data points are to this line of best fit, the more confident we are that the line of best fit is representing the data. We're just talking about talking about the train dates here and not the evaluation data. So as we can see, there are a lot of data points that quite far away. So I would say that this might have We might have a low confidence in this line of best fit . We might say the accuracy is low. Uh, we also might say this introducing something new, the cost of a data point is high or the loss of all the data points are high. Let me go into that in a bit more detail. When a data point is far from the line of best fit or the distance of a data point from our line of best fit, let's say this distance we can call this the cost. We can literally It's just the distance off the data points from the line of best fit. So this has a high cost, for example, compared to the line of best fit. Okay, so, um, you know, weaken, weaken, basically say right. If this if they're sick of the Facebook ads again on let's say these are increments of 10 here 20 we could save roughly the cost for this one data point is about 10 because there's about. It's about 10 from here, 18 to 28. So what we really want to do They is when whenever we create a model or line of best fit, we want to give just one number. That sums up how well our line of best fit is doing in relation toe all of the data. So one thing we could do it simply add up all of the distances of all of the data points from the line of best fit on. We could say the loss that the adding all of these together, that's our overrule accuracy in over cost in four for this model. When we're talking about all the data points, we no longer talk about cost, though we're talking about loss. So one way we could say loss is equal to on, and so what we kids for Now, let's just describe these distances between X and the line of best fit so want. One way we can do it is to say that this is why hat on, that's our prediction. So when I say why have let me just draw this bit better is why, with an up put our like that, Okay, That's how we denote a prediction. And then just the Y value. The actual value is just why so. 1234 This is our fourth date point. So let's just call this for now. Why? For so this number here, just the notes. Which data point? So we could say that the cost of any of these data points would pay. Why, minus what happened? What happened in this way? You the way Because all that matters is that we're taking the distance. So what we would say for the loss is it's all of the y minus for each of these data points , we're saying so for example, why one minus y forest this distance here, Why? To minus y had to the prediction is this distance here on we're adding the more up together in mathematical notation. The way that we say we're adding everything up is using this thing here. Okay, So, Addy, where I'm just showing you two things. Why hat on this? Which means the some off. Okay, so we could say the loss is the sum of all of the wise minus the predictions. So all we're saying here, this might be a bit intimidating at first. If you haven't done math before and I'm really not going to get too deep into the maths, I may create one lecture where if you're very interesting, the maths, you can get into the notation. But this is far as I'm going to really take it. So what I'm saying here is that we're adding together. So for each of these, each of the data points the distance of the data point from the line. So that distance we're adding them all together. And we're calling this the loss. No, In reality, we actually do one more thing, and that's that we at we square this number. So basically, what this does is the further away the distance is off the data point from the line of best fit, or vice versa. Basically, we want toe in quotations, punish the ones where the distance is large because obviously we don't want that to happen . So we want to say, if the if the data points are really far away, then the losses even higher. So when we calculate the loss function, this is called the mean squared error. So what we're doing here is we're saying we're going to take the mean off the square off the site. Let me let me break this down. We're taking the distances. All of these were squaring this distance for each point and then we're taking an average. So let's say, for example, this distance for X one for the data 0.1 here is one. The next distance here is 10. The next one is to next one is six. She's my hand. Rain. Next one is three. Get on. The next one is one. So what we would do festival is we square or these numbers? So we've got one, Got 100 We've got four, 36 nine and one. Then we add a little used together on we take an average So we're squaring the distance. Then we're adding them all together and we're dividing by And that's the honestly the last math mathematical notation over the with N is the number of data points. So we're taking the average. So would add these all together that's 105 141 151 because we've squared the move. Now we're adding them together and then we just dividing it by six, which is about about 25. So we can say our loss is around 25. Say, I hope you're able to keep up with what we're doing here. I hope this hasn't been too complex. If you confuse the tool, please feel free to leave a comment in this lecture, I try to keep the math very basic. All that I'm really saying here. What's importantly take away is the idea of the loss function. What we're describing with the loss function is were saying how correct our, like our line of best fit is in relation to the data. How well does it fit the data? So if it fit the if it for every data point perfectly, the loss would be zero. Because there is no distance between the line of best fits on the data points. And so what would What's really helpful here is now we're talking about the evaluation data set us well, we're saying that that we want the loss to be minimal for both of them and actually, most importantly, wanted to be minimal for the evaluation data, which is completely new data. So if we just looked at the cost of our loss function, and that's through the loss. If we just look to the loss off our training data sets, then if we just use that as our any metric, then we would always over fit the data on would make sure the line of best fit fit all our data. But actually, we want to minimize the loss of both our training data on our evaluation data. So I hope that makes sense. Andi, in the next lecture now, we're gonna be looking in some really exciting stuff and talking about how new networks actually learn, so I'll see you in the next lecture.
11. How do NNs learn: in this lecture, we will now take a take a linear regression a step further to show you how new networks learn. Well, basically going through the basics of it so you won't be getting too deep just yet. I know we haven't actually talked about in your nets, but trust me, we're almost ready to scale these concepts and will be so much easier. Futre understand and grasp the concepts. When we do so. Imagine the practicality of creating your first line of best fits as a computer. If you have millions of data points, it's actually a lot more difficult than you might imagine. If you have a lot of data with a lot of competition, you can't just set up the line of best fit in your network. So usually what we do with your networks is we initialize the weights on the biases. So what I'm talking about here is W. And she's A W and B because we have Why equals W, X and B. In the 1st 1st go round, what's in your network would do it will initialize the W and the B two b completely random , so you might have something like this in your first game. And so now what we would do is using the loss function. We would calculate the basically the distances off all the data points from the line of best fit. We talked about this before the mean square error. If you might remember the notation. So the loss is equal to some both prediction or why minus the prediction squared. And so remember, this is honestly, if you're not a mass best and please don't get overwhelmed by what this is saying because we can talk about this in very normal regular terms were saying Basically, what we're calculating is how much overall, all of our data is far away from our line of best fits. So if we once we wanted, worked out what the loss is, we can say the computer can actually say OK, so that's the loss when and where I do like this. Okay, I'm going to decrease the value of W, which, if you remember that, would decrease the steepness of the graph. Now let's try it like this says Bill s deep this time, and what you'll find is actually you might be able to tell Overrule this line is closer to the will of the data points. Then the computer might go again to decrease again. Now you can see that, actually, it's representing the data a lot better on the loss is most likely a lot lower this So we just moved in the direction and the slope. That reduces the loss on its done multiple times in your network, where calculates the loss off the model there it uses here at random, and then it so calculate the loss. And then it will change the values of WMD until the loss. Is it a minimal value? And that's also when when we're using lots of in your aggressions with activation, functions will be doing the exact same time set. Same thing except we'll be doing is we'll have a lot more W's and a lot more beast to be playing around with, and that's where it becomes quite computation expensive. But that's the whole point of the new or network Teoh. Change these values of W and B. The more linear Russians, you're using more complexity. You're able to abs abstract from the data, so each time, basically, the new network will change its W's and bees in a way that seems likely to minimize the loss. So it's not a simple matter of shifting up or down or increasing the ingredient as you talked about earlier. We have so many different modules of linear regression that we're gonna have these curvy lines like this. So up until now, we've been using linear regression and lines of best fits as kind of training wheels. To talk about neural networks now is the exciting point where we will make this transition from linear aggression on line of best fit to talk about neural nets on models and nodes as well. So models that creep that contain many, many modules of linear regression and the activation functions on a few extra things which will talk about in the next section. So now we're gonna be scaling from this in your aggression here to your networks. Finally, let's solidify what you've learned in this section with a few practical exercises on a brief recap, and then we'll move on to actual neural networks. So I look forward to showing this with you in future lectures
12. Neural networks recap: okay, we're not going to do a well. Wind. Recap on everything we've been going over so far just to make sure that was solidifying everything in your mind and your feeling, confident about all of the basic concepts because now, with bases basically got the most basic building blocks ready to gay so we can start to scale these concepts. Tune your networks, but it's very important that you understand all the concepts that have come before, so let's go through them one at a time. So some of the first things we talked about, where linear regression and the line of best fit. So we began by discussing a graph where you have X on the horizontal axis and why, on the vertical axis, the X represents an input. So the inputs that we use that as an example, was for Facebook ads. Thean Petit's you put in $5 of advertising for face for Facebook ads on the output. Why is how many cells you got back in return? But in reality, it could be absolutely anything we have putting in some kind of information, and you're and you're getting some information out so it could be for example, theme parties. I don't know how long you leave a castle on and why could be the temperature of the water, anything like that. So we first looked to x and y it like this. And then we looked at how you comm plot data on a graph so X being the inputs. Let's use this example of a kettle. So for three minutes, if the water is, if the getters left on after three minutes, it's 66 degrees, for example, they say it's some some weird temperature unit that makes sense in this situation. Okay, let's say, is no minutes and seconds after five seconds, it's 10 after 60 seconds. It's 120 of course. Usually you would use these here these increments to measure. I'm just doing an approximation. And then once, once you put this in, I mean, what is always saying is when you put in X, you get out why and then you want to be able to say, How did you What do you have to do to this X this number or each of these numbers in order to get a good approximation of why so one way of representing this pattern is by doing what we call a line of best fit. That's where we simply draw a line that in linear regression it's a straight line where you try and get it to cross through all of the data points so it can execute. That's supposed to be, ah, straight line, but looks more like a lightning bolt on. So, yeah, you have a line of best fit and that's supposed Teoh reflect the data. So here we have a nice one, because essentially the we can formalize the relation relationship between X and y as why equals to times X. So whenever you have an X number, Owen input you just times that by two and you get the output. Why? So it's quite a simple relationship here. Once you have a line of best fits on, once you have this equation you've created we can do is you can start to make predictions so you can then say, Well, if I were to leave the castle on for 30 seconds, what would the temperature? But what were the temperature be them should go on the X axis, find where 30 is, and then you simply go way up directly, upwards until you reach the line of best fits and go along the Y axis. And that would be your prediction. And we use this term. Why Hat? So why with the little thing above it and you could also use the equation exact same way. So if I wear Teoh, if the castle left on for 30 seconds, that mean that means X equals 30. Which would be why? So why would he call to times 30 which he goes 60. So we're able to use the light of best fit for working out predictions. And of course, in the real world, you know the date is gonna be all over the place. The line of best fit doesn't necessarily go through all the points. It wants to be an approximation that minimizes the distance of all these points. All the data points from the line, which we'll talk about in a moment. So those are the basic concepts there. Next, we're gonna move on to logistic regression activation functions. So sometimes it just doesn't make sense to use a straight line because a lot of the time it's just not going to be going. The distance between all the data points in the line is gonna be too high. So what else we could do is to use curved lines. So, for example, something a bit like this that doesn't much better job of going through through all the point data points. Now, in order to formalize this as an equation would be very, very difficult. It's not something most people could do by hand, for example, so there is something called logistic regression where for some lines, for example, why equals X squared. So this is where we start to use something called exponential. Just in case you don't know what X squared means, That means X times eggs. And if it were ex cubes, So if that was, if that was the number three x with a three above it would be X times X times X on the curve for X squared is something like this. So you start to use code lines for this. It would be a very complicated equation that I wouldn't be ableto tell you at the top of my head, but so that's one way to get a curve curved line like this. But in your nets what we do. We don't actually use the logistic regression. We use a lot of logistic linear regressions plus an activation function. And I'm going to continue to teach you on the activation function and actually tell you Order is yet because we'll be going over this a lot more detail in the next section. But essentially, what's important here is that the way that we work out defined this graph it through a lot of linear regression. We can call them modules or we will be calling them nodes. And after each time we do, then your aggression. We add an activation function so it could be formalized like this. First of all, we have, why equals and what would been looking as the basic equation for the new Russian was W. Times X plus be because we're talking about when you're formalizing a graph or your line of best fit W defines their Grady in. So if you have a high Grady int, your line might look something like this. It has a low grade e int. It might be something like this, the B so the W stands for Weight B stands for bias. The bias dictates how high up or high or how low is basically. So if we use this example here where we have a constant value of w if we don't change w but increase be the line, all it like this the exact same. We just hire if we use a low B, But keep the w the same again or look exactly the same, but just lower. Okay, So essentially, that's what we do for the linear regression. And then we could do something like this Said equals on Let's say just for now as we're going to say that Hey, h well, to find Paige as the activation function, I'm just gonna call active for now. But we both name it stands for activation function, so we could just present equals hate. Why on so we're not gonna be looking specifically at the output, Why we're gonna be looking at this, output said on this EDS will then actually be put through another than your aggression on the output of that weeping, be put through another activation and so on and so forth. So that's the basic building block for what's going on in a new neural net. We'll be going over this in a lot more detail in the next section, so cost and lost. This is really, really important. Now we're starting to talk about how the neural network learns. So one thing that's very important to mention is that when you have millions of data points , it's not like your your computer or program concerning to say, Oh, OK, here is the perfect the line of best fit The describes the patterns in the data on, especially because if you, we could talk about something called Over fitting, which is basically when it goes through, all the data points perfectly. That's not your office. When you go solve them perfectly at this, essentially, it's doing something called over fitting on. This is important thing to note because with over fitting, if you are toe, it describes the data we have perfectly, which is great. But the thing is, we don't have will never have enough data to perfectly describe the patterns of something eso. When we add new data, for example, we might see the Actually our line really doesn't describe the patterns of real world data that, well, it's our line of best. Fit is very limited toe what is described in the data we have at the beginning. So I mean, what we want to do is we have two different things here. We have training and evaluation data sets. So first of all we have let's say that she is the ID idea of this kettle temperature thing before we get together at 100 different kettles and we run them all. Just wants. So we have 100 data points. Let's say on here we have those 108 points and it fit. We fit it perfectly. Then we'll bring in an evaluation date set where we do it on another 20 different kettles. We find that these have a complete different outputs. Essentially, we run a model on the training date set, require train. We want off. It's ah, it's Ah, it's a balancing act because we want our model or a line of best fit. We want it to reflect the patterns in the data relatively well, but we don't want it to read too much into that data that it it goes perfectly through the points and it over fits. So whenever we do, we go once over the training data with our with our model. We will also look at it and say, Right, how is it performing with real world data? So if if our model, for example, and don't worry about really understand this, because we will be going over this as well in a future section. But I wanted to introduce this to you early on just to get it in your mind. So you run on the train data. And if if it's if the loss, which would be going over in a second, is low, then that then that's good. But if it's high on the evaluation date set, it means that we're fitting. So let's talk a little bit about costume loss. So let's say ah, model creates aligned something like this just for argument's sake. Okay, now the cost When we talk about cost we referring to just one data point. Say, for example, here this is one data point on the cost is the distance between the data point in our line of best fit. So it's this here, So basically this is are we can inscribe this point here is our white hats, right? It's a prediction, so we could look at it like this. We have a white house here, and this is just for one given value of X, right? So we can look at this, we can look, here are actual data points. We could say our true label and we can say the cost is the distance from why toe y hat or listens between the data point and the distance of our line of best fit So we could describe this is why minus why had and that will give us this distance here. So that's our cost. Now, the loss basically takes into account all the data points in this way. So we use a tiny bit of mathematical notations here because basically what we're doing is we're saying we want to look at the average distance between all the all these data points and the line of best fit. Or we could say the average of this for every data point, right? So the mathematical notation for showing the average I'm sure you know you to find the average of something. You add all of these together and then divide by the number of data points. So in maths, this describes the some off. Basically everything added together. Why? So this is the cost of every data point. OK, so the sum of all of the data points all of these distances Sorry, divided by the number of data points we have. Okay, so this is called the mean era. But now, also usually in our models, we actually calculate something called the means squared error. Because what we really want to avoid is having data points that all the way out here, you know, really big distances. Because then our model is It's really not doing very well at all. So you want to minimize these sort of things where we have state points that are far away. So as a result, what we want to do is in quotations. We want to punish the data points. The report points here, where the points are far away from our line of best fit. So what do we do? Well, one thing we can do one thing we know is that as a number increases So if we do two squared , the number increases just slightly. Right? But if we do something like 10 squids, it becomes exponentially bigger becomes much larger, right? so this any increases by two. But this increases by 90 say the larger the number. When you square it, it becomes even bigger. So the longer the distance, for example, larger the number. So if we square every time we do, why minus White House we square, then that will punish the ones that further out. So the mean squared error is represented in this equation, with some of the distances between the data points and the line of best fit squaring that to punish the outliers divided by the number of data points. So that's that's really we're not gonna go too deep into the maths. So we're just touching the surface here. So it's really important for you to note this, and you don't really have to memorize the mathematical notation. I mean, it would help, but what's really important here is that you get why we're doing this. How this represents overrule the loss represents how accurate overrule our model is, how well it's it seems to be predict, making predictions, whether it's on the training data or the evaluation data. We work out this loss for both of these data sets, So finally we're just gonna touch on learning so learning. And this is why saying the losses so important if we draw draw a graph here of our loss Learning is very important. This is where the our new network is trying to minimize the loss. So one important thing to note here is that with millions of data points, what happens is that we can Your computer can't just say I think I mentioned this area. It can't just find a line of best fit right away. And even even if it did, you wouldn't want it to be perfect because it has to balance out, um, having a low loss of both the train and evaluation data. So essentially what we do, What the what the model does, is it in Isha? Initialize is for all of those linear regressions that we're doing in the model as we described. Let's just give us we know for every single one of those modules or notes, which is how we're gonna be describing often for every one of these nodes of linear regression with the activation function afterwards, the would set the model toe actually make w and be completely random. So the line might be looked like anything to begin with. It, you have. Do you have just No idea. It's just completely random and say the loss might be quite high. Okay? And then using something called Grady Int Descent, which will be talking a lot about So. And actually, one thing I do want to talk about right now just basically is when When the computer or so when the program is doing these calculations with linear regression, the activation functions that's called forward propagation, I shall wait by the right. In that case, it take ages. Ford propagation on when the model is or the program is learning and trends and changing w and B before it does. Another Ford Propagation is called back Was propagation so essentially wants the computer. Once the program has Donald these calculations, and it's worked out the loss that is forward propagation and at that point is got a loss, and then it can go back using graded descent and work out how to change W and be slightly in order to decrease that loss. So the loss curve it might look anything like this. It's a beauty, doesn't have overhangs, but it's going to go whichever way it can reduce the loss. So every time I go change the waits until it finds this global minimum, sometimes it can get stuck in the small areas. We can talk about that later. But generally what's what the program is trying to do is find where the loss is minimal for their training and evaluation data that sets. So that's really important. Every time it decides to change the weights and biases in a certain way, then it will come back here and now it might look a little bit mawr joining on from this part here. It might actually slowly start to look like it's actually going through more data points and thus decreasing the loss. Say, that's just a recap on everything we've been. Three. Um, you don't don't worry if you don't understand, especially these this part about learning, because this is going to be a lot of what we're talking about in the next section. But if you understand most of concepts here, that's really great, and you're going to do well coming into the next section where we're going to be skating these concepts up to your networks. So thank you so much for taking part in these lectures. If you have any questions, feel free to leave a comment. If you have any feedback. Of course, it's always really welcome. So good luck with going over this lecture on and I'll see you in the next one.
13. Training wheels off neural networks: okay, I think it's about time we take off those training wheels on get started with your networks . So here is an image of in your network. I mean, of course, everyone who joins this courses at different levels, But usually people have seen some kind of diagram like this before. And they think neural networks that the most complex things in the world up there with rocket science and brain surgery, in my opinion, and the reason why I made this course in the first base is because I think the concepts actually a lot more accessible than most people think, as long as they provided in the right way. And now you have the basic building blocks, I think, to comprehend how in your networks work. So in this picture here will be just going over the start of how it works. So let's take a look deeper into how just one of these nodes works with connecting with a few other notes. Hopefully, remember the word notes that were talking about before. Let's have a zoom in and see what's going on. So here we have an example of just one note. Before we talked about it would just make sure all of these different circles here they're a lot different nodes with on your network. I think it's important just to go over. We have the input here, so the input layer is the information we're providing. So we were talking earlier in when we were doing the recap in your networks about a cattle boiling. So for that you'd only have literally one circle. One note on the input layer, which would be the amount of time the kettle's left on. However, most of the time you want to have numerous bits of information coming in, for example, in an image you'd be giving in all the different pixel values of an image so you could have anywhere up to you. You have 50. If it's a small, small picks pixellated image where you have thousands and thousands, let's say, for example, again, we're using the example off, how long of how, what what the temperature will be after a certain amount of time. But this time we're not only putting in the amount of time the kettle's left on, we're putting in how cold the water was when it was first put into the kettle. How old is the kettle? What brand made the kettle all these sore things. So we have different inputs. So we have the different data inputs to start with and then we go into something called the Hidden Layer on. We'll be talking more about what the hidden there actually is later on at the end. Here we have one final note before we go to the output book that is part of the out output layer, which is where we get the prediction, the final number. So it's want to make sure we're clarified on that. Say, now we're looking at just one of these circles, one of the notes, but we're pretending that we have more than one hidden layer. So instead of looking at this, we're gonna be looking at one note here, connecting to some more nodes in another hidden basically what's happening within and your network. This is any one of the notes. That's not the input data or the output data. Everything in the middle. What happens is for every single note and I don't know how you feel about this, but for me, I was just amazed that this simple in your network every node contains and, uh, linear regression, just like we've done before. Y equals w X plus B on then, like we talked about before this activation function. So I'll introduce a word to you. Now it's cooled. Sigmoid s i g i m, uh, pie de okay. And I'm not gonna give you all the information inactivation functions, but I'm gonna give you a little bit more information, so it's something like a sigmoid function. What they do is they introduced something called a non linearity. So we're talking about linear regression here and now we're talking about introducing a non linearity because if in each of these note we only had linear aggression, if you really think about this and you could try writing out as well, If you have many linear Russians linked together just in your aggressions, it's going to be the same as just having one linear aggression. There's absolutely no point in having multiple in your aggressions one after the other, because you might as well just have one linear regression. So what we do is we had an activation function which introduces a lot non linearity. So they're basically it allows us to have these curves in a graphs on it. It allows the new network to become complex, more complex than just having one one equation like this one menu regression equation to describe the patterns. So it's super important that we start to introduce these non linearity ease on. There are numerous activation functions. We're probably going to go over this course three or four different ones. Sigmoid is the most basic one, and we'll be talking more about it in the future. Lecture, but for now was importantly, understand is in every note, that's all there is. You have a a linear regression part, and then you run it through an activation function. So let's say we're receiving information from an inputs, okay? And let's say we're just gonna have one data point and we're gonna say, That's the kettle, For example, kept example the amount of time it's been put in. So we have an X which, that's to say, is 30. So this will be fed into here into this into this note. And first of all, why will be calculated because we'll have. Like I said earlier, the program will usually the model will initiate a random weight and a random bias, so calculate why equals whatever this number might be. There might be a randomly couch random cat random number from between zero and one, for example, plus a random bias. The outputs with y will then be put through an activation function, which we'll talk about later. So the outputs we're left with one final number and that will be set dead is what we get outs. X is what we put in. Zed is what we put out, and then we collect another node and we can look at it the exact same way. Why equals W X Plus B? And then we have another activation function. H why. And we do the exact same thing again up here. And just keep in mind that in every single one of these Neitz that the weights are going toe will be different. All of the weights are set to be different. That allows us to add or ad Sony levels of complexity. That's how we're able to you. That's when your networks alot about. We have all these different weights and different biases that calculates in different ways their imports and allows us to get such a high level of complexity that weaken describe complex patterns in data. So that's what new networks are all about. So it in the exact same thing above and below, just with different weights. And this is something important for you tonight down. If you've got paper, toe hands and you're taking notes fantastic. Otherwise making mental notes in every layer you have the weights are going to be different . But the biases you always have the same bias for all the nodes in one layer and a layer is just the horizontal This or the vertical set of any nodes. So, for example, here, this here, all these yellow ones, that's just one layer. You have multiple layers, but just make sure you understand. For every one of these yellow circles or nodes, they're different weights. But they have the exact same bias for all of them. That's really important toe Appreciate. Okay, hopefully, maybe haven't thought about it. He knows. But what's important thing about here is OK, well, here it was obvious what X is we have an import. This is the number 30. We've got extra find. So what? Do we just take this X and put it in here again. Know what we do is the out we take the output of this note on that will be the input for this node. So this thing we have here, the set that comes out, that's what we're feeding in to be X. Okay, so that's what we're going to be putting in there. So the the outputs of one note will be the inputs for the next note. Okay, so that makes sense. So just to touch on another point, and we will be going of this more with hidden layers in the future lecture. But essentially, what we've done is we've gone from inputs and we've gone into into a single note here. They would usually be a multiple notes, and then we're going through to another layer on. Let's say this layer here is then it's late here. Okay, so obviously we need to get some kind of outputs. And one thing that I was I was very curious about when I was first. Any about this is right. Okay, so you get to this one here, but you have You don't just have one zed coming out of here, right? you have numerous different sets. So how do you deal with all these imports into one of these? Essentially, Hopefully remember this mathematical notation that some of them, so simply you ads all three of these together and whatever that is, that will be your input. You run it. Three linear regression activation. Sometimes you might not have an activation on your on your last night. It depends on what model you doing. And then finally you have your output, Which, if it's for example, what we talk about with the kettle, it will be just a continuous number. So it could be Ah, they say it's a 10. So I said we put in 30 seconds, so it should be. I think we're saying we double it, right. So I will be 60 if we're doing a classification problem. For example, is this an image of a dog or not? The output would either be a one or zero. One would be Yes, it is a picture of dog zero. It would be nice. So that's what we use for classification problems. Okay, so I'm gonna finish this lecture here. I hope you're excited. I mean, what we've just done here to have taken a huge leap. We've gone from talk about Lynn linear regression to talking about how this fits into neural networks, which is a really huge jump. So if you understood and you grasp most of the concept in here, particularly the concepts to understand is that every node is simply composed of linear regression and an activation function. The next point is that an activation from activation function introduces a non linearity, which allows our our lines of best fit if you will, and I would just Oh, I'll be trying to talk less. And using the phrase line of best fit at this point by using just a few more times allows it to be a bit more curved. And it basically allows us to add levels of complexity to how to how we describe how our model notices patterns in the data allows us to add complexity, basically, so first point is linear regression. Second point is this activation function, and that's what composers a node next of all the weights and biases are are basically they're created randomly at first and then they're improved over time, which will be talking about that's how how they're in your network learns So we taken in. But the output of over nodes will be the imports of a node in the next layer. So this will be the input for all three of these notes, for example, and then, finally, we some them all together, put it three. Warm or linear regression will do. Sometimes sometimes on activation, function on. We have our output, so it's a huge step we've just taken. If you understand the concept, that is absolutely fantastic, I'm really happy keeping up. If you do have any questions, feel free to ask. Have spent a long time trying to work out the best way to lead you through the concepts. So give me any feedback you like on that if you feel like it's going well, or you feel like I could have introduced a few more concept earlier on on DSO now that as I've said, I've left a few gaps. So I haven't talked about the activation function much and a few other things. So that's what that's what we're going to be building up now in this section. But now you have an understanding of the basics of how on your network works. So really, Congratulations on getting to this point, Andi. I look forward to showing you more in the next lecture.
14. Adding an activation function: we're now going to take a moment to look a activation functions. I've talked about them quite a few times in previous lectures, and I said I'd go into more detail about them. So we're going to do that now. Essentially, it's relatively easy to grasp what we're trying to do here. So after we've done some many regression, So let's say we're in one particular mode. We've got node. So why equals W Times X Plus B? We then put W through an activation function, and we can call the output Said equals at and we're going to use. The letter hates to describe activation function some people use if we using sigmoid for sump, which er which we are. Some people use this notation. That's just for those of you. Curious. It doesn't really matter too much, but what's important here is that we're after we've done any regression. We're running it through activation function to introduce a non linearity, so usually the output will be between one and minus one with sigmoid functions the actuacion were using. It gives us a number and output between zero and one okay, and the curve for the sigmoid function and don't get too intimidated by this. If you haven't done too much, too much mass before the curve looks something like this. Okay, And simply what happens is that are y. Value is somewhere in this in this area, right? And all we have to do all the program does is it takes that y value. Let's say all y values here. What happens is the program comes up here, sees what number is hoops. No point, no, no 0.0.92 or something like that, and that will be ours, said body. So if we have a relatively high value of Y, then it will come out to the high value Bizet. What's interesting is this middle area here. So if you have, why here with output? If you do a direct line upwards, well, put this number etcetera. So most low values of why well will give a low value Frizette. Most high bodies of white will return a high value for set. It's this middle bits where you can get dramatically changed. Large changes in why So what's important for you to understand here a few points. First of all, the reason why we're using an activation function. The reason why we use an activation function is because we want to introduce a non linearity to our in your network, after all, in linear regressions, in order to allow us to add increasing levels of complexity to on your networks. That's the important point, the sigmoid function in particular. It goes between zero and one, so output will always be between zero and one. Some other ones. Other activation functions could be between minus one and one, so in practice, so for most of on your nets, at least until we start to talk about other activation functions, we'll be using sigmoid the whole time. There are a few pros and cons to using sigmoid. First of all, the pro teasing sigmoid is that the numbers always between 01 So for our final layer, for example, if we're looking at probabilities, this is really useful because with probabilities in maths, we always describe put probability, or we can do is being between zero and one. What might feel a bit more normality is to say between zero and 100% because, for example, if you flip a coin, you have a 50 a 50 50 chance, right? So you could say the chance of guessing heads or tails is 50 50 50% 50% or you could say 0.50 point five. So we're Samoyeds. It can easily be translated into probabilities. For example, back to the door image of a dog. You could say the output might be a 0.6, and that could mean it has a 0.6% chance of it. Being a dog or your network is zero points, 60% confident that the image is of a dog. So one of the comes of using sigmoid is that these areas here and this won't make too much sense right now. But we'll be going into it much more when we're talking about Grady Inter sent and then your network learning sometimes they can get the value should get stuck around this area is very hard for it. Thio Thio update its weights and its biases ingredient descent because there's there's not much of a change in the gradient, the further away so the higher the value of while the lowest, the further away you get from the middle, the harder it makes it for the new or network to learn, So we'll be discussing that a lecture lecture as well. So that's the basics of the activation function. Now that you have a grasp on how the activation function works as well, we could start looking into how a full on your network works and all the ins and outs of it . So look forward to seeing you in the next lecture.
15. First neural network: now, after going through all of these concepts and going through so many building blocks of neural nets were now here at this point where we are able to go through a full one pass of a neural network to look at it in play. So congratulations for making making it to this point. I want to emphasize just how many different concepts you've been able to pick up on land so far. They've gone to that to this point. It's really fantastic. You've made it this far and well done so in front of you. Now you can see this is the basic neural network now. Usually, when you see diagrams, they usually tend to be just circles. I want I made a few of these squares just to make sure it wasn't too confusing, because I think it's important to differentiate the notes within your network on the other things that play. So in this lecture, we're going to be looking at one forward pass of the neural network in a future Lechter will be going on to in just a few lectures time. We'll be talking about what happens when you do the learning site, so as you might remember when you go this way through in your network, this is called forward propagation. Okay, so this is It's basically so forward. So this is essentially where in the first case, the inputs is here. It's run through some notes. This is a very simple in your network, but once we scale it toe more nodes, it's the exact same thing. So don't worry about that. It's fed through here and all of these, as I said in a previous lecture, all of them have different different weights. Andi, they are randomly initiated, initialized. So in the Ford passed. Essentially, you're just in a loss of calculations with the linear regression and the activation functions, and you get your first loss. You work out how accurate basically your on your way. What network is. Then comes the method of going back along this way, and that's called backward propagation. So just put back proof so back propagation, which will be going over in two or three lectures time. This is where essentially using a certain mathematical method weaken. Look, it's where the program looks at the model. How do we need to change the weights and the biases in orderto reduce this loss. Say, now that we've gone through that, I just want to keep in mind these ideas Ford propagation of back propagation. That's looking example in this new network. What we're going to be looking at is ah is an example for a bank. Banks like to give out loans, but when they give out loans, they want to make sure that the person is going to be able to pay it back if someone isn't able to pay back a loan. This called, for example, deferring. So they want to create in your network where we can put in information about the person and it will outputs the likelihood that they're going to differ. So we're just gonna have to make it simple just at the start, which have one input. And that's going to be the person's salary per year. So salary OK? No, obviously, when so we've talked about before. We talk about training and evaluation date sets now, in order for us to training about in order for US T to train and actually to get the loss off off our model. What's really important here is that we have existing data to train the model on. So our data, we would already have loads of excess and wise. So our access would be existing people and thats so the exact X will be their salary. And why would be if they differed on it? Uh, for best handwriting, ever. That says salary. It doesn't say surgery here. Okay, so this is the salary on doesn't say scare either. This is a salary, and this is if they've deferred or not. And what we can do with different not we can have that as a zero or a one, because our model works with only numbers so zero can stand for if they didn't differ of one. Could be for if they did defer or vice versa, but usually would do it that way. One would be the postive, which in this case is they did differ. So that's how we train it. So let's look at what happens in the trip when we when we do on the training data. So we put in one person's salary here. Let's say it's per year and in thousands this past makes 24,000 year. So then what happens? Okay, is this number is put through three different nodes inland in the less on. Within each of these notes, we have the linear regression. So why equals W X Plus B? And then we have that because that would be the activation function right on the way. And we have that for each of these different notes. And what's important is that each of these notes are different in one important way. They all have different weights. That's the re important thing here in one given layer on this is a layer, so the vertical nodes were in our layer. They have different weights, the randomly initialized, but the same bias. There's just one bias being used for all of these different notes. And when we do back propagation, we update the bias, so they're all still the same. But we update the weights and they'll still be different. So this number, this number 24 is basically we do the y equals W X plus B and often w we a random number between zero and one or negative one on one. So we'll be, let's say, for example, rendered in a size 0.5 times 24 plus this bias that's been initialized and you get it's no important for us to work this out by hand. I don't think it's important for you to understand the concept behind it. So the output is this way on, Let's say, for arguments sake, the number is 18. Okay, then this number 18. That's why is then put in through this activation function, which for now is the sigmoid function, which we're talking about. Area okay, and that's a it comes out. It always comes out as number between zero and one says say it comes out of 0.6. The exact same thing happens here except with a different weight. That's a number, comes out 0.4 and then it's a here, another one comes out. It's 0.7. Okay, so now these of what will be fed into this and again, this is just the exactly same. This is just another note with a different weight on a different bias because it's in a different layer. Now we're getting to the output there, so we have three different inputs, they because, as you see, we just want one input to go into anade. So we some these together, which will be 1.7 and then 1.7. We multiply that by a weight and we had a blast. Let's say, for arguments sake, the number is 2.9 okay, and then this This number is put through an activation function and then say the output is no 0.6. So the output there's no 0.6 and because, like, are saying in the previous lecture, signal functions are very useful toe output probabilities because it's number between zero and one. So he could translate this to being a 60% chance where one, as we said at the start, is when the person affairs and zero is when the person doesn't. So we could say the person's probability of them deferring is 60% now to calculate the loss . So this is this is interesting, essentially, what we do to calculate the loss. Now that we have these, we have true labels. So why is a true label and why hat is our prediction So we could describe this is prediction. We could describe this. The output here is our prediction. Our prediction is that this person is 60% likely to differ now in a perfect world, what we would like it to say, because let's say this person did differ. So that is a one what we could say. It was a 100 hour 100%. It was 101 100% likely that this person did. So when we come to our true labels, it's either 100 or 0% right? If they did or they didn't so we could say that the loss here there's difference between these two of 0.4. We should probably call this the cost, actually, because it's just one number of the moments. But now we're going to start talking about a few other things. So this is an example of a very basic neural network care with just one hidden layer. So we describe a hidden layer azi anything that's in between the import lair on the output layer. So one hidden layer with three notes very basic. We're just using one data 11 x one variable to describe our input. If we were to use mawr, it might be things like their age, what their profession is, how long have they had their job? You know it's there different data points that we could add. We'll go into that in a second, but we're talking about one other thing fast, and that's having multiple examples, so you can't really train a network based on just one person. So we need to give this this network hunt hundreds of thousands, if not millions of examples of people who've either deferred or not. So it's very computation, expensive to put a 1,000,000 different people for a network in one day. So we do it in something called Batches. That's important word for us today. So let's say we want to save on computation so our computer doesn't explode. Our program doesn't just fail with an out of memory error. Let's say we put in hundreds different people in one go, so it would do this whole thing for each person, and we would have a number of different costs, right, and the costs where then will be at would take an average of all of the costs. So let's now, let's use an example where we're just feeding in a batch of two different people. We have a cost of one for the first part of 0.4 for the first person. And let's say the output for the other person was zero points to, so they're unlikely to defer and they didn't differ. So there's difference in zero point E. So we have a cost of 0.40 point two. So our loss overrule would be so would use the mean squared error which we talked about before, which is the sum of each of these or the average of both of these squared. So basically, I'm a very good at working out squares of decimals. However, let's just say, for argument sake, 0.2 squared is I don't know. Let's just say it. 0.1 and zero points, four square to 0.16 Let's just have no would add those two together, and our overall loss would be zero point 26 divided by two. So every loss will be zero points 13 That's for a batch of two different examples. In reality, would do a batch for, you know, hundreds at a time, and then we would calculate the lost of loss of just one batch and then we might run run it , using the evaluation as well, once we've done but once had done forward propagation the training set we might do on the evaluation as well. So I'm going to stop here. I don't get too too far into this just yet. The any other thing I want to add is that if we have numerous different inputs, um, and let's say Let's just use an example of using two impacts because it could be scaled, the concept could be scaled by many more. So the 1st 1 was salary. Let's say this one is age, and the person's age again is that's a 30. Okay, so now what? What would be doing? This would be running this through all of these data points and also would be running this one through all of the data points as well. So it's important to realize that this what each vine pits are going to be added in like this. Okay, so for each of these, we're going to have numerous different inputs. We'll discuss this again in a later lecture, but this is what we described as a fully connected in your network where everything is connected to everything in between layers. So the input here is connected. All the nodes and the hidden nodes in the hidden there and each of these are connected to the output lamb. So this is these are all the points for our basic neural network. So we've talked about having an input and how we run it through in each of these nodes. Linear regression in activation function. Each of these nodes, the W's the weights are different value initialized random randomly, often in the first Ford propagation, and then they have the same biases. These outputs have been some together into the output layer, and they're often there's a linear regression and often an activation function that comes with it. And then we can calculate the loss and the cost. Once we've been able to do this, then we can go on to this back propagation where we're able to go through, go back through the neural network and look at how big the model looks at how it can change the weights and the biases in order to lower to reduce this loss, using a method called radiant descent, we also talked about what happens if we have multiple different types of input and how they are also connected to the notes. So that's it for now. Eso go over this lecture, feel free to ask any questions you might have in the comments, and I'll see you in the next lecture.
16. Multiple inputs: So I just wanted to touch on an important part of what we were talking about in the last lecture, which was when you have multiple inverts and how that works, because this is an important point to grasp, so we're just gonna go through really quickly now. So in the previous lecture, we were talking about a bank who wants to create a new or network in order to predict if someone's likely or not to defend on their bank loan. So we were saying one point you could use would be salary, so when it feeds into a network it connects, it is fully connected, so it connects to every single node in the next layer, which is hidden there. So each of these will go through in your aggression, say it will be multiplied by a weight, and then you'll be adding a bias before it goes through the activation function, which we can just put as a little Sigma or had been using hate in the past either, or it's fine. Now we want to look at obviously, when you're when you're looking at someone's like you differ on their bank loan. You can't just look at their salary, you need more information, so you could look at the age of the person the number of years employed. For example, you could have up to 102 100 different points like this. The more the better, really, if they're relevant to the potential for someone to differ on that bank late. So what we can do here is for every import. It is also fully connected to every single node. And that's why when you look at diagrams of your networks, you see there's a lot of of lines, which makes it look very complicated. But that's all it is. It's every single input connecting to each of the note in the next layer, so we'll do the same thing for a number of years employed. So let's just break this down to make sure you're completely confident in what's happening here with just one input. What happens is it goes through linear aggression here so we could just put why equals w X plus be a good thing. I read that out loud, otherwise that would not make any sense. So you going through linear regression and then you're going through the activation function I hope you're very happy with that now because I think we've gone through a few times. So for sorry, or do that in every one of these four nights. Now, when we add another input to this, what's going toe happen mathematically is that each one of these is going to undergo linear regression using different weights that are initialized and the same bias will be at it. So you're doing a linear regression for every single one of the inputs. Now what happens is before it goes through the activation function, these aerial, some together. Okay, that's the mathematical representation of something, something. Cases reimported. We're not taking an average. We're not doing anything except from just adding them all together. So the important point is you have, let's say, let's describe this as we've been saying X for an impact. But this will be X number one, input number one. This will be an input to this will be input three. Okay. And when we're generalizing for just any X little we could do X end. Okay, So basically, what we're doing here is we're doing why one equals W one times x one. Plus, this is important point be because the bee is getting the bias is going to be the same every single input here. If we do about two data points, that would say why too, equals W two. Because it's gonna be a different wait Times x two because we have a different input. Plus be, let's say we're working with just two inputs for now. Okay, then what? We'd run through this activation function. We would we would do page activation function off with some of why one and why t And if we had, ah, hundreds different inputs. So X War next two x three blood about 600. We could say we're something of them all together up to why 100 say, That's all I really wanted to say about this and this would would count here as well. Let's say we had another hidden layer here which will be talking about in a minute. We had a little late here and then here we had another four circles. No, I don't. Donald. You give all the information because then the next lecture will just have nothing of interesting. But each one of these names will connect every single one of the nose in the next layer. Say these two nodes will both connect up that say, all four of these needs will connect to this bottom road here. And all of these different nodes will go through linear regression with different weights. And there will be some together and then put through an activation function will be summed and then they will go through the activation. So that's the main point of this lecture. For you to take away is to realize that there are going to be multiple connections to any given note. All of the inputs will be going through their own linear regression with their own weights and the same bias. There'll be some together and then put through an activation function. So now we're gonna talk a bit more about hidden layers.
17. Hidden layers: quickly now, just to go through hidden layers. So the purpose of having hidden layers is is to enable us to add complexity to our network . So confined complex may be hidden patterns within our data. So if we have some quite simple data we probably wouldn't need to use in your network, we could use just in your aggression or something simple like that. But the reason why we using it, your networks often is because the data have some complex patterns that need unearthing, that we can't do you through simple math ourselves. So within networks, it adds a great deal of dimensionality but would discuss it is complexity. So the increase in complexity allows us to solve more complex problems and unearth more complex patterns. So in in your networks, in fully connected networks, which is what we're talking about here, everything is connected between the layers. So just don't make sure this is clear. Actually, here, this is our input. And this here is our output just really that made that completely clear. So, in a new network, we have three different layers here in this part here. This is our input man. And then we have our hidden layers. Okay? And then we have our output layer here. So without input, this might be if we're looking this banking slower again, it could be the age. And remember, we have multiple impacts. Say, just to make that clear, I'm gonna have a quick, other rectangle underneath. So we have as many inputs is we, like on This is our input there that we have our hidden layer and are up, but they're okay. And output layer. It can be many points as well. It depends if we want just one output or not. So in this example, we literally just want to know the likelihood of the person deferring. But if, for example, we were looking at an image we might want inputs, outputs, saying the likelihood of this being a dog being in the in the image this being a cat as well in the image so is whether you're doing buying reclassification or multiple classifications. There were yourself too much of that sort of stuff. We won't go into that detail. We have an input layer here, hidden layer on output there. In both the input and output, there can be multiple points born to the main important thing here. All of them Burt's are connected to all of the nodes in the next layer along and then so on and so forth. We have all of these, as you can imagine for each of these they have. They all have unique weights, and they're all going to be doing the linear aggression and activation functions afterwards once they've been subbed like we talked about in the previous lecture. So that's that's a lot of computation, if you think about it on and the mawr hidden layers you add, and the more no do you have in each layer, the mork on the Mawr computation expensive. This is gonna be so. There are two things that you can change up here. I'm going to stop drawing these lines now, and you can add as many layers as you like, and you can also add, as May notes as you like. Basically doing either of these will increase the complexity of which it can solve the problem. Or it can extract mawr patterns more complex patterns from the data that we have. So that increases the how how much computation is required on another thing it does also, is it it? We run. We run the risk of over fitting like we talked about before. So if you extract too many really complex patterns from just the data you have, you might be over fitting and it might not work on adding new data. And that's it for the hidden layers. Essentially, the main points to take away is that between the layers, everything is connected. We've already talked about all of these connections have, even if there's three notes here connecting to this one note here, they're all going to have separate weights. But the same biases. Onda. We have multiple inputs and outputs that also map to, for example, or the nodes before and after it. So that makes sense, and I'll see in the next lecture.
18. Back propagation: so now moving on to a really important subject. And that's back propagation, essentially, how the neural network learns on when we're talking about learning. All we're talking about is the new or network changing its weights and its biases now behind back propagation. There are a number of methods we could use for the for this course and most commonly used something called Grady Int Descent on. That's all we're talking about is the loss function, as we're talking about before finding the downward, Grady INTs off the loss function so we can find the lowest possible loss now involved. Ingredient dissent is a loss of mats on day. I was talking with my peers about whether or not to include the math in this electoral not , and we decided, for the purposes of this course, to allow you to be able to understand the fundamental concept of your networks and implement them. You need to get have a grasp of how it works for the maths. What I'm doing is adding a supplementary sheet here in this lecture you can download and take a look at if you're curious about the maths. But within this lecture, what would be doing is providing you with a strong intuition about how pro back propagation and grading descent works so that you're able to comprehend how it works and you'll be able to implement it yourself. So as I mentioned the two main points off this lecture, and to appreciate that for every single link there is every single connection there is. There is a different weight. So or three here. For every single one of these lines, there is a different weight everywhere. In these there are different weights. None of the Nativity are the same. They could be, but usually they're not. For every layer we have, we have a different bias. So it's every link within this, for the example of these nicks here have the same bias. All of these ones here have the same bias. Okay, that's important for you to comprehend all of the there. So many different weights and a number of different biases. So, in forward propagation, what we're doing if you imagine this with me, let's go back to this idea of the salary. We have a number here. We can just say it's 20. It doesn't matter about the number. What we would do is then carry out linear regression on this number three times, and for each one of these, we were then carry out an activation function. Now, as we go into from hidden, they're wanted in there, too. We'll be doing any regression again, except this time it will be getting numerous different inputs. So what's important here is that we understand that with the linear regression, we add some of them all together the outputs of the linear regression and then run it through an activation function. When we keep going, until we get to the outputs where we get to the outputs, we can then measure the output. We can refer to you as the prediction or y hats. We look at how different this is compared to why are true label that say that's given us to us from the data so we can work out the loss right, using the means squared error which we talked about before. So once we have our loss, essentially we can then what great in percent is all about. And this is this is the the method of great inter sent distilled just for you guys. Essentially what you're looking at is is you ask yourself or the program asks itself if I change W If I change the bees, will it? Will I go? Let's say we're at this point here. This is where the losses that the program can't work out. It's just the broken is essentially blind, and what we can do is is say, if I go this way, how will the loss go up or go down? If I go this way? Well, they lost couple go down. So if I change these weights and biases, will I go upwards in the loss or will I go downwards? Now we tell the program that we wanted to minimize the loss, says, always going to be searching for a downward radiant. Okay, so ingredient to send. All it is doing is saying, How do I change W and B in in in order for us for t minimize the loss to find this downward slope. Now there is a danger danger to this that we would talk about it one moment, and that's that's finding a local minimum where it says, Okay, I'm at the bottom now if I go this way or this way I'm going to be increasing loss again. So this is going to I'm going to stop here basically, even though there's a better loss down here. So let me just refresh this page and then we can look at the learning rate. So if I draw on a bit of a bigger graph now, this idea of our loss function So here is how are loss works. And let's say we the program Rendon randomly initialize is our weights and biases. So they were this point of the loss and it will basically say right if I if I change the weights this way, I'm going to have a ah negative grading i e. I'm going downhill. So we'll continue to move in this way, changing the weights and biases until it gets to this part here. When he gets down here on either side, it may say to itself, right, if I go either this way or this way, no matter how I change the weights is going to increase in Los. So stop here and this will be the minimum loss. This is what we call a local minimum. We want to find the global minimum which is down here. So one way, Teoh avoid this is to start using something called the Learning rate. So this is what we call this is another word. Futre. No down hyper parameter. There a number of hyper parameters. And these are just basically, um, different parameters we can tell our model to use. So basically they're instructions for our program. So here the learning rate what That any rate means is how big a step you take each time. So if you have a low, low running learning rate, it's going take small steps like this. If you have a huge learning rate, it says, Okay, if I If I change the weights in the biases in this cut in this way, then I'm gonna go down the loss. So I'm going to really change those weights and vices. You might go all the way over here, and then again it will update itself. And I would just keep going like this so we'll start taking gigantic steps like this. And the good thing about this is that it looks beyond these local minimums. The problem is, is that sometimes it can take absolutely ages for it to find this bottom part because it's constantly jumping between these long distances and back, and it doesn't take a very long time to actually find its way down. So again, every hyper parameter there is Onda within hyper parameters that includes things like the number of hidden mayors. You have a number of no Jeff. It is a definite arts to finding the right ones, but we'll be talking about high parameters in a later lecture. What's really important here is when we're talking about great in dissent and back propagation. What's really important is that you understand that at any one point you're when you do Ford propagation in your network. What's happening is it's going through all those weights working over the linear regression and at the end of Ford propagation, it gets to your loss. It works out. The loss at that point is saying okay, so if the loss is high, then it's not doing so well, too yet, and you can get your program to output. The loss, after every time it's done for propagation, assumes it works out the loss. It condense. It can then look at how can it change the weights and the bias is to make the loss a little bit. How could change the weights? Advice is to reduce the loss. Andi, by stating the learning rate if you have a high learning rate, will be saying Take big steps. It has a lower learning rate. It's saying Takes more steps in back propagation. All it's doing is that it's saying right, Okay, I'm going to look it through all of the weights and biases, how I can change these weights and biases in order to reduce the loss. So that's that's so when you work out the loss, that's forward propagation. And for back, it's updating the weights. It's changing this w b and so once you once the program has done this and then repeats again and so says Okay, now that I've updated the weights, that's work. Let's calculate all of that and let's see where I lost is that hopefully the loss has reduced somewhat, so we have a lower loss now, but it's probably no at its best. So it is going to continue to the backers propagation again on this, ladies and gentlemen, is how a neural network were runs a with way three for the whole thing in just us Ford propagation, back propagation, forward propagation, back propagation. And so what's happening here If we just use this as an example here, this diagram We're four propagation. So it's it's working out what the losses and then in back propagation is updating A with weights and vice is involved in this. And so all of this together. When you do one once four appropriation once back propagation, you're doing what we call one epoch, and what you hope is going to happen is after every e book, the loss is goingto go down. So in practical, what you'd be doing is using your training data sets and your evaluation set. So what you do, you do Ford propagation on your training date set, and you work out the loss for your training, and then you do the exact same Ford propagation on your evaluation. And then you do back propagation updating the weights based on the loss from the training data, and you keep doing that. So you always doing back propagation updating the weights based on the information from the training data, because you want every time the model does. Ford propagation in need book. You want it basically Thio Thio, calculate the loss and learn blindly from its not land. A tool on the evaluation date set should be blind to the evaluation data set. It's only learning from the training data set on. So that's how on your network runs from start to finish. Just eat box of Ford propagation back propagation, constantly working out what the losses and then updating the weights and vices and change up updating them based on reducing the US and then going through four propagation again to see if the loss has indeed gone down or not. So in general, the Maury box you run your model on the Mauritz going to learn, and the further it's going to reduce its loss and increase its accuracy. So congratulations, you have just gone through Ah, fully running neural network, conceptually. So now I want to start taking three is how we're gonna do this practically which is again very exciting and want you to do that. You have a very powerful tool at your hands. So let me know if you have any questions about this. You know, we've gone over a lot of different concepts So it's important that hopefully have been noting down a few of the things and you're able to recall from memory some of the most basic important concepts. So why would recommend you do at this point is try and draw out on your network by yourself and try to describe in your own words to yourself, or would be great is to another person what's happening in every single stage, and then you can notice what the gaps in your knowledge are and you can go back through the lectures toe workout, right? Where are my missing information? And then I will see you in the next lecture.
19. Installing python : Now, if you've decided to go down the Anaconda route now, we're just gonna go over that now. I think one of the re big prose of using Anaconda is it just makes it much easier to install everything. So first of all, gato anaconda dot com slash download And then you can select here your operating system Windows, Mac or Linux. I usually work from a Linux operating system, but for now, for this product, I am using windows. So we're gonna be using python 3.3 point six if you've used python before, Essentially, you understand the difference between python to and python three. There's lots of differences in syntax which basically basically just means writing code in a different way. But we're gonna be focusing on Python three. Currently, the latest versions 0.6. But if when you're using this is already 0.7 or higher, don't worry about it. Eso on. One thing to say is, if you've already if you already have python installed on your computer on DNA, now you're installing Anaconda. If you come up with any areas, it may be because both of them are installed, See, may want to uninstall python before installing Anaconda on Let Anaconda, then install python through itself. If that makes any sense, so I would recommend selecting. Hopefully you've got 64 bit computer. If you're not sure, you can always just check your system. Usually it will be a 64 bit, so just click on the 64 bit graphical installer on It will be downloading and you should get disappearing. Go set up. Say you think next Feel free to read. Read the license agreement and if you're happy with it, click. I agree the insults just for me. So yet space required is 2.4 gigs. So you click next here. Can we keep this un selected so that I can just open anaconda from the wind from the start menu Rich, tender condos, My depot, python. So if you really have python installed and you don't want Anaconda to become the default python, they then un select this. But every now it's just just select that. Keep it selected than installing. Feel free to keep. Leave those checked if you like, then finish Okay, Now the anaconda has been installed successfully. Some of the main points to go over First of all, let's go type into your search Anaconda prompt. So this is just like your command line. Or if you're If you're on Mac, the terminal on essentially, what will happen is from here you can do lots of useful things. So first of all, let me just the increase the font size so that we can all see what's going on here. That's better. So So Anaconda is installed. We've now opened our anaconda prompt which he confined by deception on your computer and up will pop something like this. After a few seconds, you type in conduct, which stands for Anaconda in stool intensively. So for all other frameworks, the good thing about Anaconda is it's already pre installed them. The only one we need to add as an extra in order to do deep learning is tentatively so we just install that now by putting Kanda in store tensorflow enter and them grip it. Yes, for proceed. Okay, so now the tensorflow has been stored. We should have everything we need in order to do deep learning and creating your own ETS with python. If for any point when we're going through some of the future lectures on Do you try and import one of the libraries that I'm talking about? One of the frameworks. And it says, For example, if we're trying to import numb pie and it says Numb pie not found, you could just come back to the anaconda prompt type in Kanda in stool and then whatever you want to add, so it might be numb pie intensively will be going through all of this. If any point you find that there's an issue there, you just come back to this point here. So in the next lecture, now that we have, all the frameworks and libraries installed were furry prepared. Now, to get going with creating are in your networks from scratch. So we're going to be doing this in Jupiter. Notebook on. And there are two ways main ways to open its or three. You can get your start menu, and at the top you might find you two notebook. Or you could just type in the search due to notebook, and then you can click on that and it will run the way that I always tend to do. Is I come to the anaconda prompt? I just type in Jeeps notebook. You see something like this come up. And this is basically the directory that you're in the from your anaconda, and then you just click new select place and three, and here you are ready to go. So in the next lecture, we're going to be going over how you can use Jupiter notebook to maximize your effectiveness in creating neural networks.
20. Jupyter notebook: So now we're going to do a whirlwind tour in how to use duped a notebook, the basics and a few little tips and tricks to help you make get the most out of this tool so you can either open jobs notebook by searching for it. Here. I'm going, Teoh, go to the anaconda, prompt, and I'm going to type in Jupiter notebook. So either you can just open it from the Start menu and click it and it will open. Or you can go to the Anaconda command line or command prompt, and they will take you to this page. Here you can see at the top opens in your browser, but it's not running on. The Internet is running through a local host, so it's running directly from your computer. So next you go here, click next, go to Python three. So just a word about this. This is the home page, and it's showing me all of the folders that in this directory, So this is where Anaconda is running from, and so we talked about how you can change what director it's working from in the next sector. When we talk about the command line itself. But for now, this is due to notebook. Essentially, what's really useful by Jove Jupiter notebook is it's helpful for approach. Typing is helpful for reporting things in your codes or in your results on. It's also very helpful for teaching. It could be used in very large projects, however, what I recommend you d and what will be doing in the future is when we're actually running your networks is sometimes will be using troops notebook. If it's a large in your network, then we'll be running it straight from the command line as a python file, which you can easily do. So let's just go over a few basics. So hopefully you already have a good understanding of python. So you write. You write in a bit of code into one of these cells. We call this thing a cell here you can you can basically, just if you press down, shift and enter, it's gonna run your coat for you. Okay, eso This is your input where it says in number one on the up it is Hello Sometimes if, for example, I want to import the tense for a framework, it's going to take this broke jeeps and a little bit of time to import it on. What you going to see is a little Asterix here on that's basically say that it's waiting, says You can see it's waiting, hasn't yet done yet, and then a visit you want has been done. So if I were to run multiple cells, then that'll have stars because that'll be waiting in a queue. And then there's one after another. The numbers would appear for most of your needs. What you can do is use the buttons up here. So instead of doing shift enter here you could just tap on this arrow here, which is run cell to do the exact same thing. I prefer to use the shortcut keys in order to find out all the different shortcuts. You can just have page, and these will appear, and you can look at these in your own time. A few of the main ones that makes just things easier in general is if if you're actually clicked on the inside of a cell, this will be green. If you click here will turn blue, and this means you can play around, go between cells and such and you can use the keyboard shortcuts. So, for example, I can add to sell above by tapping a I can add another cell below by tapping Be if I wanted to lead to sell entirely I double tap on the d So one really helpful thing that we will I use a lot So just imported tensorflow, for example eso what we can do. And this is one of the reasons why like dupes notebooks so much if I write, tend to flit And actually, I'm gonna do something We always imports tensorflow as TF, it just makes it a lot easier to use. It feels quite strange, actually trying trying it without it. So, for example, TF dot layers is one of the functions that's within tense play. Now, after the DOT there are loads of different functions that can come out of this model layers . So if I tap tab on my keyboard, you're going to see these all the potential functions I can use from TF dot layers. So I'm going to choose, For example, TF dot layers dot com fertility. This is for a specific type of neuro network and then in parentheses. I'm Let's have forgotten what inputs I need to put in. I can hold down shift and then press tab is going come up with all the things I could put into the parentheses. All the different parameters for this. This function here inputs, filters, etcetera. If I tap this plus icon here, it gives us the dot doctoring and explains to us what this is all about says really, really useful to understand that tab could be used for auto complete and shift tab can be used to get more information. So the final point is, if you right here, where it says untitled, this is where you can give a title. Okay, so that's gonna be the title of the file right here. And if you want to save its just control s and you'll see that saved a few seconds ago Now , if for any any time, like I say if it Asterix appears here and it seems to be going on for way too long, sometimes it just means that the Colonel has died Or essentially what? What I mean is that it's kind of crashed your cade. So what you do is you click on Colonel and you just click, restart or restart and clear up. I always click restarting clear output. That means it's going to restart all of these cells so you can start again, and it's going to remove the outputs of the same time. So clears the memory a little bit. So as you can see up here next door, it says, trust is it now, says Colonel Ready, which means it's restarted properly. Finally, one important thing is when you, when you save this due to tutorial when you save the by the Gypsy notebook, is going to save Azan I p Y N b file. If you want to run it as a python file adoptee wife, you go to download as and then you have the python file. Just here says an important point tonight. Eso next up, we're gonna be talking about the directories. So you basically from here or from here I can I can basically load anything that's in this directory. I can go through into all of these forgers as well. So in the next section we're gonna be talking about how Teoh open Jupiter notebooks in a specific directory, how to navigate everything through the command line. So thank you for watching, and I'll see you in the next lecture
21. Command line needs text for linux and mac commands: Okay, So now that I've shown you how to use the basics of Duke Tonight book I just wanted to go over a few ways of navigating through the command line as we went over in the previous lecture. Wherever in whatever directory you open tube to notebook, those are going to be the files Andi folders that are accessible whilst using Jeeps notebook. So it's important to learn how to navigate through your command line so you can open duped notebook in different directories or different parts of your computer in different folders . Essentially so. And that's so today, I'm going to be showing you how to navigate through the command line and how to run part how to run Python files through the command line. Now this is going to be different between operating systems. I'm working on Windows, and you may be working on a Mac or you might be working on only next machine, so basically they are going to be just different things. You type in on anything that's different, and in your aunt is I will make sure that text appears down here to show you what the equipment is for your operating system. So we're just gonna go over three basic things for navigation. First food is C D. This shows us what directory we're currently in. Okay, and then we're going to go for D I. R. This is going to show us a directory of all the folders that are in blue skies. So let's look at one of these photos we have here showing Let's go, Teoh documents or desktop that's good to desktop. So what we do is we get type in CD and then we type in the next four. You want to go t and we can't let say, For example, we know the forwarder in in desktop that we want to go t so we don't want we want to go into desktop and then into another photo. After that, we can't just type in that next vote. That folder after desktop, we have to go one step at a time. So we're now working out of desktop. So if I was typing D are now it's showing me all of the photos that are available in within this area. So now, within desktop, I could go into my you to me photo. Now if I want to go back a folder I can type in CD space dot dot and that's gonna take me back one. So those are the three main command. You have CD to see what your current working directory is, Then you can go CD. It's a into a 41 to go into if you type CD space dot dot You'll go back one. And finally you have the i r. Which will show you all of the working photos in this directory. Okay, so now for running python files, let's type in jeeps in a book. So we're now working. As you can see, there are a lot less folders now from compared to the previous lecture because I'm working from a different folder now working from the desktop photo. So I'm gonna open a new Brighton file here, which is gonna be print. Hello, Command Lied says all I want happened. So if it were to run, all that's gonna happen is gonna print that out. So one can do is go to file download as python file now, just to show you to show you in Florida. So it's actually it's been downloaded into my downloads for us. I'm just going Teoh pace this. Yeah, that's already hit Just to make sure we had now have the untitled python for just here. So forget the command line. I'm going to stop running juice notebooks. I just hold down press control. See? Let's go. That's gonna shut down juice a notebook. So as you can see, the untitled dot p y is now on my desktop, so I can type in python untitled dot p y. And as you can see is run. That file from the command line serves the basics of the command line. Next up, we're going to be going over very quickly, aws. And how that can be useful for larger machine learning projects. And then we'll be doing your first hello world program using tensorflow.
22. AWS: So I just wanted to introduce to you the tool aws So this is offered by Amazon on. They don't want to this in any way. I just find them very useful. Eso AWS offers three use of powerful computer processes, either CP Use, which stand for central processing units or GP use, which are graphics processing units say most likely on your computer. Unless you have purposely bought a laptop or computer that has a GP, you're probably gonna have a CPU that can handle most likely the neural networks that we're going to be working with in this course. But if you're looking Teoh, create some your networks, they're more complex, require more computation. It tends to be that you want to. You have access to some GPS? No, I think it's really cool. Using Amazon is, it's really easy to essentially connect your computer with a much more powerful computer. If you consider it like that, you get your code to be run through the other computer and it runs, um, remotely. But you can see it all through your through your text desktop to your command line, which I think is quite cool. So like I say most likely you're gonna be using a C P GPS were originally mainly created for high detailed game where program would have to be constantly computing all of the date changes in the fine detailed pixels within anyone one screen. So it turns out, that's been very useful in machine learning and deep learning. So it's becoming hugely popular now for for deep leading. So the rates for AWS relatively cheap less than a dollar an hour for their services, super interested to keep watching. I am considering doing another lecture or going more in depth into these into using AWS. So let me know if you if you'd be interested in me doing that, you won't need this for the course. Basil in important for you to in interviews I've done, I've been asked numerous times in the past. If I have used the C T or S three before some scuttlebutt, there's really quickly. S three is what we use on AWS for storage is very cheap storage. Weaken in store hundreds of gigs for cheap, and you can connect it so you don't have to have anything started on your computer, and you can connect your storage here to the GPU that you using three AWS as well. Issue two is the other part of it. This is where you actually use the GPS or CB use that are powerful. T run your models. So you have s three and 82. I thought just introduced those to you. If you'd be interested in me providing ah, further lecture. Going more in depth and how to set this up and how to run it. Do comment in this lecture, Andi, I can create a lecture in the future for this. But for now, I just wanted to introduce these things. T So in the next lecture, we're going to be getting stuck in with doing using tensorflow with the first basic hello world, So I'll see you in the next one.
23. Hello world tensorflow: So in this lecture we're going to be going over tensorflow, which is a very, very important framework in use in creating deep learning, using python. A lot of the scripts I'm going to be showing you for the tutorials. I've decided to create a folder called Python Underscore Script. So I just want to show you how I'm going to navigate here to in order to open it with Jupiter notebooks just so we can revise what we were talking about in the last lecture about the command line. So I'm opening up my command line, which for me is the anaconda prompt. But for most of you, it could be the command line or the terminal. So essentially you get here and I want to get to my desktop. Some really putting CD desktop and then on the desktop is this Python script. So I'm going to put CD by thin. I can't remember for script or scripts. Some go type in de ir and too big a python scripts So seedy python. Or it could just tap tab and or doing auto complete for me, It's no, I'm here now. I'm going to open my jeeps notebooks by stopping and dupes notebook. And as you'll see in a second, what is going to do is open up Jupiter notebooks in this directory so that then I have access to the python files. So here we are. We have all of the difference Python files. So we're gonna go to put the hello world for attentively. So here's a code I prepared earlier. Just so you. So you understand what tense flows for we have. There are a number different frameworks out there which essentially provide all of the functions you're going to need, probably ever in order to create your neural networks to create very, very complex models. So there are different frameworks out there that do pretty much the same thing. Tensorflow is probably one of the most popular frameworks out there. Has a huge community online. There's a lot of support for it. You can think of tensorflow. This is really important, since Rose described as computations in graphs that executed using something called Sessions. Now, when I say graphs, don't think of the X and Y graphs that we've talked about in the past. Think of Mawr. Imagine if there's a mountain and you want. There's a big lake at the top and you want all of the water to run down this mountain. So what you do is you dig out. You carve out into the mountain from near the lake from the bottom of the mountain, you dig out a trench. Okay, that's you creating your graph. And you can imagine actually running sessions as releasing that water down through the trenches. So, yes, basically, what you're doing is your setting up. You're saying you're creating a whole model where you're saying this is how when data goes in, I wanted to run, and then right at the end, you say right now, feed in the data eso within the graphs, every single thing that we're putting in. Essentially, we can describe these operations so these could be simple additions, or they can be full layers in your network, so each operation operations could be done for Constance variables more complex data structures, which will talk about in future lectures. But mainly there are three parts to your networks. So setting up the setting up than your network itself eso how the data is gonna feed into it, then the architecture of the neural network and then finally running it and feed the data in. So that's the basics in our hello world, what we're doing First, we were importing tensely, and it's pretty much standard to import tensorflow as TF because it just makes it a lot easier writing out. So first we were creating this variable called hell A something tensorflow dot constant. So constant could be a string. It could be a number, but for now, we're just gonna have a basic string. So for now, we can just we can say this is our graph. I'm doing that with quotations because I think graph is it's kind of counterintuitive. And I think most people get confused because they think of the extra my graphs. You can pretty much just think of this as the the architecture of what you want to create, and then you're you're fitting it when you run a session. So in order to run a session, what you do is this you write with TF dot session and make sure you've got a capital letter with parentheses. Have sesh prince sesh start, run. Okay, so basically, what you're doing here is you're saying run this session and you want this to come out. So if we run it, you get hello world, so we can change this to numbers, and it's just gonna print out. Okay, so this is the really most basic tensorflow program you can right where you're creating a variable and then you're saying Prince, print out this session, and the session is basically just running this. So you're printing out this session Start, run, so that may don't make complete sense to you right now. But what I highly, highly, highly recommend you do at this point is open up your in. If you don't already have dupes, notebook up on your laptop on your computer, get out now and run through this at least three times a week. I recommend you do is you do it first time when you can see the codes that I've just written, then try doing it blind at least three or four times because it's re important to get to it . An intuition on running these sessions on Think of thinking of it as two separate things. First, you're setting up whatever you want to hear, and then you're running it with this session run, and I am very confident. If you do this at least five times, you'll be a lot more confident going into the next lecture. So take some time now to consider this on. When you're ready, I'll see you in the next lecture.
24. Feeding data and running sessions: now to step things up a notch after have done the hello world of of Tensorflow, we're going to go over just a few more ways of building up your intuition in your understanding of how to run a tensorflow session with a few more, slightly more complex examples. So, first of all, just running a multiplication before we were just put in one variable and ST printed off. Now we're actually going to do some kind of operation you can see. So to begin with, room 14 tensorflow as TF, and hopefully you're coding along with me. At this point, we're creating a variable when setting it is a constant. So the variable we're actually creating a matrix now, which is a matrix, which will be going over in the next lecture in data structures. It's an array of numbers, so you can think of it as lists within lists or lists within list within lists. You can always tell the dimension off the date structure by the number of square brackets, the start. So this is a two dimensional matrix right now, so it's a list within a list, basically. So what we're gonna do is create our first Matrix with the TF dot constant and the constant just says it's some kind of number, constant number or a string. So let's just put in before number 10 and then we're gonna have the exact same thing and at another variable and then was creating variable were saying product equals TF dot Matt mo Matt mo just means multiplication. So if you hold down in the sprint in the parentheses shift tab, I will explain to you that it's what issues for is doing modifications of within matrices. So we can do now is we run this and you're going to see this is gonna do absolutely nothing whatsoever. Well, we actually get any issue here, so let's see what the error is. And we can work through it together. So let's have a look. We go, we scroll right down to the bottom of the era. It says dimensions must be equal, but are one and two from Atmel, with input shapes to 1 to 1. Okay, Okay, so this is actually touched on a really important point, which won't be entirely necessary for you to fully grasp for now. But it is very helpful tonight. So I'm going to print out two things right now because I think I know what the problem is. Prince, uh, Matrix one, the shape. So, basically, what I'm looking at here is I want to show you is the shape of the data. So these are two different made sees. As you can see, they're both the exact same shape they are to buy one. So that's the shape of their tension is essentially the shape of the array. So So what's what's important here is that's when you do multiplication between two matrices. They have to be like to buy one multiplied by one. That's one by two. It can't be to buy one buy to buy one. It could be to buy one times one by t one by two times to buy one. So into shape, change the shape of one of these. I'm gonna change the shape of this one. So this one is now to buy one, and this is one by teeth, as you can see is now happy with the shapes. So with matrix multiplication, there is this very important rule to know so that the dimensions of the two bits of data They must be complementary in this way. Not the same, but complementary in this way. Okay, so now that we've done this, we've essentially set up our data correctly. This is our graph you could call it, or we've created the architecture. And now we just need to feed three and and run the session. So hopefully have done this a few times now yourself in the previous previous lecture. So with TF dot session, make sure its capital s with empty princes as sesh. You could put that is where you want. But sessions pretty much standard results equals, says don't run. And then we're putting this in here, so feeding in this variable and they want to print out the result. As you can see, we get the result, which is 870. So we can even now look, it's the shape of this. So we could say result. Prince results dot shape. And this is a really handy thing to use dot shape to show you what? The shape the data is one by one. Okay, so this is important thing tonight. This is kind of like a rule we have with the mathematical modifications of major cities. If it's one by two times to buy one is gonna be one by one. So we could say if we have more patients who have won it maitresse matrix, that is a by B and then we have one that is be by a The resulting shape is always gonna be a by a case. That's one rule to make a note off. OK, so that's our 1st 1 Next, we're gonna be looking at placeholders so placeholders we use in all of our neural networks is sent. Essentially, this is a very important thing to understand. I'll run it, run through, run, run through it with you. So essentially, we're creating two variables here. So TF, doc, placeholder and then hear What we're saying is we're creating a variable that is a float. So I'm sure you know, our flight is an integer. This is the whole number. Float has any any number of decimal points afterwards. So we're creating to placeholders that have floats, is important to use floats, so use TF dot float 32 and they were saying see e equals a Times B and they were doing the exact same thing again except we're using this feeds dicked because we haven't actually in our first example, we already stated the values right away in the variable. This time we're saying we have a place. Although we're saying when we run, this is in the session, then will feed in something into this place order. And that's where we use this feeds dicked. So we run the session on C and we to feed values into A and B. So we write feed underscored addict equals. And then in curly brackets, we say what variable were were feeding into them with colon. Let's say is 100 and then we're gonna put B is three. So we're on the outbursts. We print final. We want final to be 300. See up later on this first, of course, and we have 300 there we get as a float. So what we've gone over today in this lecture is how to run multiplication using, say, stop, run. We've also gone over the the shapes of data on the importance of having them to be complementary shapes and what the shape of the outputs going to be based on this a B B a rule we were talking about that the output is always gonna be a in shape. So talked about the data shapes for more applications. And then we talked about how you can feed in data using this feed decked, feeding it into place orders, which we use floats. The data type is floats from feeding, and so I hope you can make a note of these things. And again, what I high recommend, just like in the last lecture that you do, is that you go over these doing matrix multiplication in a similar way here and have a few goes at feeding in data in using place holders on the feed dicked method. So if you do that a few times again, I'm confident you you be building a very strong foundation for yourself. So have a go take some time and I'll see you in the next lecture when we talk about data structures in a bit more detail
25. Data structures: So now we're just going to very briefly go over data structures in terms of how to feed them into. Neural networks were going to be using images in order to convey this message, the basics on data structures. So here that say we have an image. Each one of these squares represent a pixel, so this image is four by four pixels. It's an incredibly basic image, so I would be very, very pixelated. But we're just using this as an example, So the height and the width of our image is four by four. No, what's important to realize is that in color images, you have thes rgb values for each and every one of every pixel. It has a value for red value for green and a value for blue. That's how we define the color of any pixel, so each pixel has three of its own values. So in order to t describe all of the difference numbers that would be in an image, you'd have the heights times the width, times the depth. So the depth is three because you have red, green and blue values. So the overall date structure is height, times width, times death. Now there's one more thing to consider when using tensorflow and actually all over all other frameworks, because you need to take into accounts not only the height, the width of the depth, but also the number of images of different images you're going to be feeding in. So let's say, for example, we want to feed in. You have 44 different images, then the batch is going to be four. Well, let's say just to keep the numbers different, the batch is going to be. We have 10 images, so the batches 10 we have a height of four, with the fourth on the depth of three. Now tenses can be an N dimensional ray. That means it could be any dimension, dimensions at once from your networks. It needs to be four dimensional. Say what now we're talking about the reality of actually using in your network were using the placeholders and something called TF dot reshape. So when we're setting up our placeholders like we've done in the past, when we're setting up placeholders for for tensorflow for feed your networks, we set up with these four different parameters, so we always puts none on. The reason why I put none is the 1st 1st day of data is that we're saying we don't know. We haven't specified yet how many images we're going to put through any one time is saying , Be ready for any number, then the height of the image, the width of the image and the depth, which is RGB. And just before we put it into a network, we do a T F reshape. This is essentially making sure we're turning it into the correct shaped tensor. It's always the exact same thing is what we do with the place order. So he put minus one instead of none. This is simply the syntax of using tensorflow, and then the height, the width and the depth. And the most important thing to keep in mind is, that's when feeding tenses into and your network you want to keep it for dimensional. No matter what the data is, we always want to put it into having the batch, the height, the width and the depth. So when you ready, let's move on to the next lecture
26. Loading data into tensorflow: Okay, We're now going to step things up a bit and increase the speed. Put the pedal to the metal. Little bet we're going to get our hands dirty. Now we're gonna start looking at how we can actually load in the data in a really good way so that we can feed into a neural network without any problems. Now, like I say, we are going to be ramping it up a bit. So prepare yourself. Why I recommend you do is you have a pen and paper next year next to you and you code along with me. The first thing you should do is from whatever directory you are working from in Jeeps notebook, you create yourself a folder. So what I have done here, As you can see in Python scripts, I've created a folder called Loading Data. I create another photo court images and then what I'm going to be doing is loading in images of drones on one of names. So for the purposes of this lecture, why recommended tea is exactly They're set up some folders adding a few images to two different folders, but you could let in whatever images you want why I recommend you do is try and try and use two different images that very, very different, like night and day, something that's really easy to differentiate between the tea on. So once you've done that and it's definitely in the correct directory that you're working from, this guy has come here and we're going to go through this one step at a time. I I'm not sure if this intimidation looking this character code hoping not too much. It's gonna be really simple, understandable, want have gone through a few times. So let's just start with the basics. Import OS. You might use this before us stands for operating system. Essentially, it's a library that's really useful for doing things like dealing with folders dealing with finding the right paths in your computer. So we use this just once here. Well, in this in this function, we create where we just were telling the computer or the program toe, find the images on our computer. That's it. Tensorflow, of course, is the framework were using for deep learning. Numb pie is re important for dealing with the numbers and putting them into the correct data structures for us p a. L stands for Pillow and it was also hopefully was installed using Anaconda. And essentially, what this could be used for is loading images on later on. What will be doing is also changing up the images. Changing the size SK Learn is a very powerful machine learning library. Andi Essentially, it's stands for psych. It learned eso it has been used a lot in data science and machine learning were using trained test split simply to split our data into training data and evaluation data. I think we've been through training evaluation quite a lot, so hopefully you understand what that means. Finally, we using something called mats plots lip, which is used in in, ah, Visualising data and what we're going to use this for It is simply so I can show you that you have successfully not only loaded in the images as numbers in matrices in the date structures, also as the actual images. So let's get started. The first thing to think off is that we're creating these empty lists. X and Y X is going to be just like from little model in your aggression. X is essentially what we're using for the imports. This is where we're going to be putting in all of our images. Why other labels? And that means right now from from my one, for example, we're using examples of Nome on drone, right? Those the two different images that we're looking at so we can ignore this line. For now, I'm going to just come into out. First of all, I need to specify what the two folders are. Some creating two variables here, once called gnome Ford out. The other is called drain Photo. I'm basically giving the directory, giving the path story from the directory from which we're working s o It might look a bit different for for your operating system. You may need to put a slash here in order for this to work. But remember, don't just put in this text here, create yourself a folder, and then you write it in here for yourself. So now we're creating a function in order to load in all of the data in one go. So hopefully if you've been through python and you and you know your stuff. If not, I do recommend you take a look at a python course. I offer one that goes through all the basics and prepares you exactly for this in machine learning. So death is your defining a function I'm create the function. Aim is create data and the two parameters on putting in the folder on the name. So the folder is going to be one of these two folders here, and the name on putting in is what we're gonna be using for the label to put into the Why it lest here, which is empty, some creating a fully appear. I'm saying basically, for I and R s not list there. What I'm saying here is that's going to the photo name. So I'm going to be saying for basically I hear stands for image. So for every image in this folder, os dot listed just means basically put this text into the context of being an actual fora something for every image in the photo provided I don't crave arable warm. Doing here is I'm saying this is from the pillow, which which we imported the top here. I'm saying image to open something open an image, and one doing here is I'm saying, open an image on this OS stock path thought Join this joints together the folder on the image name. So let me give you example If we use in the drone and in our four leap, it starts with this. This is just called one here, this chemical one dot jp dot day Paige. Right? So it's gonna be the folder name, which would be the loading data slash images slash drone slash because we're really in the path, not join one dot j pic. So it's saying open this image and then it's gonna go on. The next thing I'm doing is I'm taking this variable This opened image I'm saying Turn it into a numb pie array. So we imported numb pie, other top. And as I mentioned, it's very important for creating day structures and array you can just think of as a matrix . It's just basically saying all this is doing is saying right with opens the image. Now turn that into numbers. Turn that into a matrix. That's always saying Now what I'm doing here is I'm saying, add this small X into this big list dot upenn just means add to the end of the list. So it's gonna adds the image as numbers into this list. At the same time, what's really important here for you to realize is that when we're importing data, we have to import the correct image with correct label every time, right? So what we have to do instead of just putting in ad again, right? So every time you had an image, add in the word name or the notes were drained. Actually, Tensorflow doesn't like it when you put strings in like that. So what we do is we just convert the name to a number. So it's a zero if it's a no name or one if it's a drain. So we're saying, because at the top right here were either only importing gnomes or either importing drones that we can say. We basically say We've got this dictionary now. Whatever name we used to the top, I want you to import that as a number. So if we using gnomes just but in a zero in the way, so that's for the corresponding index is going to be alive. Their image data and it's got corresponds to either a zero or one, depending on if it's a name or drain. Okay, I had That makes sense. If you have confused about anything in this in this part, please feel free to leave a comment and I'll do my best to respond. Now you're cooling dysfunction. You're putting in which folder it is, which we defined here. And then you're simply putting in no more drain so that we can then get the one or the zero when we're adding into the way. So once we run this for both, let me make sure I s I c to import all of these. So once those have imported what I could do then as I run this and essentially, what I've done now is I filled up this ex and why I filled it with all of the different images. So if I put in X, what we're going to see is this beautiful, long, very long array. Because it's all these, dot dot dots means there's a lot more numbers. Essentially, all of these are different images filled in numbers, which is really cool, I think. And then, with the way we're going to see a ones and zeros because those are all of their names nor the drains. Let's just make sure you hear that yet. Well, the names Children's. So just to show you what I've done here is this is the P O. T. Which we imported Map block mats plots lib dot plot is PRT, so we use P O. T. And this is what we do in order to show any image we have to say, p o t dot I am Shea and then say what image we want to show. And then we just call this by saying plt don't show. That's it. What I'm doing here with egg zero is I'm saying, basically each one of these eggs Area x one x two if you think about it, especially if you're. If you're well versed in python, every this stuns for the index, Right? So this is saying in Python Analyst. The first index is actually zero. You're saying use. The first basically shows the first index of X, which will be the first image that was added in our function x zero x one x two x ray, etcetera that we're gonna show different images in X site. Let's give a try Eggs area as you see it prints off one of the names underneath is going to have printed off the numbers off this image, said. This is this image in numbers. And when I first heard this, I thought, Wow, that's really, really cool. If I change the number here, it's going to show the next name etcetera, etcetera, and it's going to show the different numbers. Now let's see. So if this is their index 012345678 silver pen ate it should show me an image of a drone. There we go. I comparing it here and I'll show me the numbers course by numbers. So I'd say, That's part one of loading in the data. We've been through a lot and really don't feel overwhelmed. I can absolutely assure you if you go through what we've just done here and you go over a few times, you become confident with using this thes functions, importing these different libraries and frameworks, and you get your head around what we've just done try and a few times by yourself with different folders. I absolutely assure you that you have just taken a huge step in being able to carry out your networks because one of the biggest headaches in machine learning. Artificial intelligence to begin with is importing the data. So give this your best shots go over a few times on when you're ready, will move on to the second parts, which basically involves splitting the data into training and evaluation. And then how we actually put this all this data into the correct format in order to feed into in your networks Once you've done all this honestly, setting up the new network is actually the easy part. This is probably one of the most difficult parts for now. Say go over a few times, do it with your own images, and when you're ready, I'll see you in the next lecture.
27. Lloading data part 2: Okay, So hopefully you have been been over this part of the loading in the images yourself a few times on its giving you an intuition for how X and y wax leading in the data. What's the most important thing for you to understand here is that when you're loading in data, there are so many different ways of doing it, especially when you work with different. It's different sets of data. And so the most important thing to realize is that when you're loading in the data, you loading all of the input into X and at the same time making sure the indexes are the same for the data points, why is going to be a tree label? So this is one method that I've used that seems to be the best for when you're loading in images that you're able to load in both X and Y simultaneously, making sure the index stays the same. For me, this was a huge pain point when I first came to machine learning, and hopefully this has given you intuition on how you can do this in a smart and quite efficient way. Now that we've done all this Let's look how we can split the data. And to be honest, it couldn't be easier. Uh, psych it land Make it incredibly easy to do that. You might remember at the start of the previous nature, we imported something called SK Land. We had escaped from s Kaylan dot model on score selection, import trained tests. But no, this is gonna be hard to remember to type out every time in this way, but it's essentially want you to remember is that you're creating Instead of queen evaluation, they quote test, which is kind of confusing beer, creating some training data and evaluation data for both X and Y. And you have to do in this format ex train X test. Why train? Why test equals they put trained test splits and then you just splitting. Why X away? The test size here is basically saying how how what percentage of all of the data are we going to put take out, take away and put into the test test set. So here are being is just 10% which tends to be an okay number on the 90% obviously goes into the training set. So what? You've run that you're gonna, you know, gonna have ex trade. Oops. We're gonna have extreme and you're gonna have a different why Train s O X test. It's like they're two different of us know So two different sets of the images. So now let's get into how were you gonna feed this into the network? We've already gone over the placeholders which hopefully is going to make everything a bit easier. So essentially we're creating placeholders for for our X and placeholders for why I'm calling it image, place and label place. We could turn this into X placeholder and why place, order whatever you like, as long as it's readable and you can look back later and understand what's going on. So you could creating placeholders four x were returning that we want the images to be floats on. Then we're defining the shape in square brackets that none, as we talked about previously, is saying that this is the batch number and we're just saying prepare yourself for any number of images to come in. We haven't defined that yet, and one thing that we haven't done above but we will do in the future is reshaping the images to that A with the same dimensions, say intense flow. It's really important that all the images have to be the same dimensions. So for now I'm just saying it's one hundreds by 100 the depth is three eso That's the RGB values were talking about earlier. If it just had one channel. For example, if an image was black or black and white, it only has one channel. It's a depth of once it was black and white. It's what that's important thing to keep in mind for the white boy placeholders, what we have to do is put none and then the bracket and then a comma. That's what we have to you have to say. We don't know how much of what is going in and why is this a single number Sudan to worry about? So next we're going into one high encoding and I'll be doing a whole lecture on this. Just afterwards will be a very short Lecter lecture, but just to make sure you understand what one hot one hot label he is, essentially, we currently have our data. If it's a known that it's a one, if it's a drone. It's zero, but actually tensely doesn't like like that it wants to have either like this. If it's a name, it's gonna be one for Nome and zero for join. If it's a drain, is a zero for drove for Nome on a one for drone. Hope you don't make sense in the next lecture, but basically it just wants to put it into these numbers where it's they would say yes, is this one thing? And no, it's no everything else. So what we're doing here is we're doing TF one hot. So we're taking these all are Y labels, which we should probably turn this into. Why to make it work because we put it wide, place it. So what? They were doing this function one hot on our wise, and we're basically saying there are two different options that could be Can I be 01 or 10 ? Then we're essentially taking this array, the race for the images, which I will turn into changed X. We're taking these images, and we're taking them from a raise created by numb pie, which we did in the previous lecture. In this line here, X equals np dot array And what we're doing is we're just saying we're taking in that inputs and the shape of it is going to be instead of none were using minus one. This is just the syntax of tentatively, it just wants you to put minus one step none. But it means the exact same thing we're taking in the heights, the width and the depth. Now, once you've done this, it's 100% ready to be fed into a neural network. And like I said, the previous lecture Now we're onto the easy part. Loading up the data in preparing in the right way is actually the difficult part. So each one of these parts of re important trip splitting the data like this, creating your placeholders, doing one hot encoding and then reshaping your your ex into a tensor. So all of these points are really, really, really important. So I do recommend you go over yourself a few times with a few different examples, just as you were doing hopefully in the off the previous lecture, you went through this with your own image, is now trying to extend what you're doing with those images to splitting them into training and evaluation sets. They call it test here, but we call evaluation creates setting up your placeholders during the one hot encoding and then getting input there. So have a go yourself on when you're ready. I'll see you in the next lecture. We'll be going over one hot encoding and then will be leading on into setting up the architecture for on your network.
28. One hot coding: okay, so I just really wanted to quickly touch on what One hot, including Waas. Andi. Make sure I make one important point, and that's that. One hot encoding is useful in classification but not regression, and I wanted to find the differences between classification and regression. Its relative intuitive classification is when you have, like a yes or no, or you're classifying things into certain boxes like you're saying, Well, this person pay back their loan. It's either a yes or no. Or you could say in this image, Is there a bird, a cat, a dog? And you could say yes or no to to any of them. So is those sort of things where one hot encoding is really useful when you have lots of yes or no answers with regression, it's more with continuous numbers where you're saying, for example, how much is the house worth if it's a certain age, or how fast gonna car move if it has a certain weight of it? It's all those sorts of things where it's just a number that's returned that could be continuous, not necessary. Yes or no says just have a quick example of one heart including Let's have done a question now, and five people have responded. Person once said their male person to said their female three or four, said they're not specified, and fivers said their female. So for the first person, they've said their mail, so it gets a one. This is when it's one hot encoded. Yeah, so it's turning from someone saying that their male to saying it's a one for male and zero for all of the others. For the second best they said their female. So it's just gonna be a one under female, not specified. It's gonna be one for the not specified. If, for example, they could be both then this is just making up Now. If they could be both male and female, then you could have won 10 if the person put male and female. But important point to make here is that when it's a positive, you're saying it's giving it one and everything else that where it's a negative result, so is the best and female. That's a negative. They're not, Female says Give me a zero, so one hot encoding. It basically splits the data, whether it's positive or negative, and it gives it a one for if it's a positive zero, if it's a negative for the others. So that just clears up what one hot encoding is on. Hopefully have been over in the last two previous lectures. You've really had a go at loading in your in data and understanding how it all works. If you have, then you will be breezing through the next point on how we set up the architecture for on your networks.
29. Neural network lets go: Okay, We're making some really get great progress here, So let's keep the momentum going. I just want to say you say far, well done, really, on making it this far. Loading in the data and preparing it is definitely one of the headaches of machine learning and creating your network so well done. This next part, in my opinion, actually is a lot easier. Onda. Actually, I really enjoy doing this part. So let's get started. We finished up last time. Heaviness input there. Having the input layer is kind of like we have the Havel. The data, now intense, is the correct let's say, the correct format to put into the neural nets. So in order for us ter, to put it into fully connected in your network, we need to reshape it one more time. We toe basically right now we have our data is four dimensional. We have the batch. They're height, the width and the depth. But for fully connected layer, we want to stretch that all out. So it's just two dimensional so that basically you've got, you know, a with the numbers in one nice long line said they could be fully connected So essentially , what we do is we We make sure that the shape and let me just check change that to in put well. So we're putting the input layer into here on would we re shaping again to minus one by height, times with times death. So it could be a two dimensional tense that we're putting in to the your network. So in our first, So what we've done here, let me just say this is our imports. And you might remember when we were looking these kind of diagrams before, earlier in this first there were quite used having just take one or two circles here. That or just one, though, to connect the next. There. Now each one of these is a is, ah, even our a g or be value. So as you can imagine within a normal picture, how many pictures there are three times that connecting into all of these. It's going to be a lot off connections. Indeed, more than I care to draw. Let's just put it that way. So this is our input layer connecting now. Toffoli connected there. So let's see how we do that. I sent you What we're doing is bringing into it and putting it into a variable. And we're calling TF. So TEF dollars is really important in fully connected new networks. All we have to use a dense layers. But when we go on to you so of convolution, your networks, more things in future courses, this is gonna be a re important theft at last. To use it for a number of different things or using it for in your networks is for dense on , actually fusing for dropout as well. So I thought I'd put that in just to show you how she used. So it's literally to create one layer of Ah Newell Network. It's this simple. It is one line of code, which is fantastic. So TF dot layers dot dense. We're importing this flatten, which is our previous variable, and the number will be units. So units is ah, each one of these every one of these you want connected so them or that we have, the more complex our network is gonna be. The reason why it's just 10 is because I'm imagining that most people are running their programs. They're going to bring their programs on CPIs ons. Maybe they their laptops won't be able to take too much. The mawr units we had here. The more complex will be, But the more computational expensive it would be. And some some viewers Computers might crash next weeping in the activation function. So we're very much used to having seek what? Having sigmoid as i activation function. Let me just Yeah, this is what I've been run. Okay, So essentially wanted Won't be tense. Play dots n n dot Really say this is important. Instead of using sigmoid here, we're using another activation function called Really? Because in practice really is actually a lot better than using sigmoid eso. I'll be discussing the different types of activation functions in the future lecture. But for now, just keep in mind that sigmoid was the easiest to explain. And it's kind of the basic activation people basic activation function people come to grips with the first really is actually in practice a lot better. We'll discuss that in a future lecture, but for now we need to know is that we are using really, instead of Sikh void as the activation function. So as you can see, we're basically doing creating three different layers here. So we're importing FC one into the next layer, which makes sense, Right? So in putting all of these into here, and we're doing that three times. So here are three less FC one FCT FC three, and all of these are all fully connected. So when we're saying dense TF dot layers not Dent is basically saying to make it a fully connected layer, connect all of them to each other. Okay. And then on the on this layer here, I've decided to put in dropouts. So dropouts very good at making sure it doesn't ofit. I don't think we're gonna have a problem of over fitting here because I network isn't that complicated. That complex, I mean, so the reason why manning drop out here is just to remind you what dropout ISS so rate of 0.5 century putting dropout TF dot layers dot dropouts in putting the previous layer and then rates Which dropout is 0.5. So we're saying remove randomly 50% off these notes, so just ignore them. So these will be, Let's just say they randomly chosen like this. These will be absolutely ignored, like this so none of them another knows, will be connecting to those. Finally, we're creating warm or dense layer, and we're only we're putting in the dropout layer from previously. We want to units this time. So in the final week or the output layer off in your network, we when we're doing classification like this, we only want to have the number of units the same as how many different options there are. So if we look at it when we talk about women, what we're putting y into you got to be zero once there, any two options, really? And in, ah, one hot we put, we always have to there simply because there are only two options. So he to be 01 or 10 as we talked about in the one hot encoding lecture. So the output always wants to be the depth or the number of different options. The same is it's gonna be in one hot basically, and so this is now our output on the outputs. What we would hope would be one of the each woman node signifies whether it's a gnome. So this could be known, for example, and this could be drained. And so let's say this one is comes out a 0.7 the number because give you a number between 0.1 because of the activation function, and this one is 0.3. So basically, what would would be happening here is that the this note here is saying that it's more likely to be a known than adjoin. And so the output would be that it thinks it's a known Andi. Let's say it was a drone, then it would have, Ah, quite a large error of 0.7, obviously. So that's how output there and that is the architecture. Often your network, we have the input layer which recording flattened here. And then we're creating three different three layers fully connected, applying drop out to the final, fully connected layer. These all hidden layers remember, and then finally we have our output layer of just two units. So now that we've loaded in the data, we've set up our architecture. There's nothing left, brought it, but to run our model, setting up how we're going to be calculating the loss function on, then setting up the box and doing force and proper Fords and backwards propagation. Wolf reporting how accuracy is increasing as we train our model. So this is gonna be a re exciting lecture. So have a go setting up your own architecture on when you're ready. I'll see you in the next lecture.
30. Give your network a brain: Okay, So everything is pretty much ready for us to run our model now. So one thing I wanted to mention is that with all of the image, well, just what I've made sure I've done is I've got 50 of each of the images you don't need to have. The same number was just a note. Nice numbers have. We'll have made choice because we know we're not creating most complex in your network in the world. I want to make it relatively easy for their network to distinguish between the drones. Says you can t most than have white backgrounds no relatively similar. And for the gnomes they have more colorful, colorful backgrounds tend to be in the garden. That's gonna help on your network to learn these patterns. So 11 other thing I've had to do is also need to resize all of the images. So I'm going to resize them to just go for 30 by 30 because that will be reasonably large, large enough for it to pick out patterns but not too big that our computers will be able to handle the computation. So let's go for 30 by 30. I was doing 10 by 10 But I think that is a bit too pixelated. That's very, very small, even 30 by 30 a small I think we'll still be able to get away with it. So much could change the roads in parts 2 30 once I have done this. So we are now at the point where we where we're going toe pretty much run the whole thing now. So let me just talk you through the final steps. Now that we've created the architecture and outputs a few things we need to do, we need to define, and we'll go through this in love details. So don't worry. We're going to define our loss. So how we're going to work out the loss? Up until now, we've been talking about using the means squared error soft Max cross entropy with Lodge. It's sounds very complex and scary. It's pretty much the exact same thing with a bit more maths, so you can visualize. It just is being the means squared error that we've been doing but just a bit more math in there. Then we use ingredient to send optimizer, which we talked about before. A learning rate of 0.1 is relatively is low enough, so it's a good rate. So we're finding how so First of all would find the loss. And then we need to say, How are we going to reduce the loss? And we're gonna use Grady Int descent toe learn for the new or network to learn and reduce the loss, and then we're talking about our training operation is we want to minimize this loss because so far vain said what losses what the optimizer is. Then we'd say, Right, we want you to minimize this loss and then this is for our in peace of mind so we can see how it's doing. We want to work out how many how many predictions are correct. So what I'm saying here and it will take you a few times to get used to warm doing here. TF Dog Max basically says when with one hot and that's basically that's the the true labels that we did the start 01 or 10 So we're basically saying, uh, comparing if this is equal. So basically, if our one hot labeling is the same as our lodge, it's so which ones are correct when we basically are GMAC's mean says, if our lodge it's it came out was 0.8. Since they have won it, bump it up to one and orphans. It went down to zero. So you could see if basically, what this is doing is saying is the one hot encoding the same as our lodge. It is our output from the newer network, the prediction, and then we can work out the actually by just saying reduced, So getting the mean off are correct predictions. So essentially you for each same TF to equal it returns a booty in. So essentially is going to say is is our prediction. They saw it is our true label with one hot, the same bizarre prediction, that's what this first nice day and then the 2nd 1 we're turning the Boolean into a value either one or zero, and then we're getting the mean and between zero and one. And so I, Chrissy, is a percentage if it's 100% accurate. Okay, now into the final part. This is where we're running the session now. So, first of all, we need to define to vary variables we want to state. How many eat box we're going to go through? If you remember, Eat box is essentially the number of times you do Fords and back propagation for the for this, let's say 50 or run its on. Then I will fast forward through the training and we can go through the results about size of 10 is about right for what we're doing because we've only got 50 of each image eso I think that will work fine if we had a millions of images, which is of course what you should have. But for a time what'll it's fine. You probably have a back size of 100 of thousands. It depends on what your computer can handle. So now what we're going to be doing is essentially, this is now we're running all of this. So up until this point is all been creating a graph of tensorflow. We're setting up all the operations, but when no, actually running anything through its This is now where we say right now, we're going to run through it. So we start with the old with TF session as sesh as always. First of all, we do sesh don't run TF dot global variables Initialize er essentially that just initialize is our Where are they are place holders that basically activates them? Let's say, gets them ready for four thing for data's to go through it, so you always need to initialize your variables. Then we're going to be using to four leaps. The first fully were saying for I in range epochs so we could puts for, uh, weaken. Yeah, we can put for step in range E books for examples of each each. All right, let's make this very readable. Uh, we'll put This is a pock total. We can say foot each you book in e book total. That's better. So each e book in Evoke Potato and then we're going to shuffle ex training. Why train so that the images going through in the training are always different and randomly shuffled. And now this is where we run the whole thing. So I'm good changes to because this is the batch. So we're saying for batch, start in range zero to the whole length of our ex train size. That's 100 because we have 50 images of each, and then the step size is our batch size which we defined up here. Then we can say batch end. That's the end of the batch is the same as the batch starts, plus the batch size. Okay. And then we can create the batches. So much X and batch Y equals. And this is where we use these. What? We just created the batch start batch end. It's just changed these that bit more readable, but start, but And so what we do is they're saying we want X train from batch start to batch end, and why train from back start to batch end. Okay. And of course, so these these will always be random because we're shuffling them. So, you know, we're doing shit sets start running. We're running the session. We're running our training operation, which is where we said to minimize the loss, which is here to find optimizer are lost, were saying to minimize it. And then we're working out the accuracy and this is simply so we can see the accuracy. The really important part for the training is simply this line here. Such that run training operation. We're doing the feed dictionary which were talking about before. So we're putting into X place, which was our placeholder right here at the top. The highlighted part here. So feeding into this X place in Y place we're feeding in this ex train X test. Why train? Why test? Which we were creating all up along here. Okay, so we're running it on all of the X train. All of the X test on this is gonna be our accuracy. So where we just find accuracy would find it just up there. We were working out using billions. If they are equal, the our predictions, so are our labels without predictions. And then we were working out the mean of that to give ourselves and accuracy. So each time this runs, we want to print off the e book. This dash end just gives us an extra line, and we're putting the block number, which was I. But now, according each report, and they were printing out which batch it is. So from batch starts out of the whole batch number dot, dot, dot and they would printing off the training accuracy and the test accuracy 0.8 f simply means than the number of decimal points we want. We want printed outs. So let's just go this once more. Let's go from the top before you run it, we'll get through this. Okay? So first of all, we are running. We're saying our procession, we are initializing the global variables. That's that's basically initializing setting up our place orders. And then, essentially, for each e book with shuffling the X train on Why trains so that it's completely random when it shuffled. By the way, the the same image sticks with its same label that doesn't that isn't affected at all. And then we say we're defining what are batches are with much X and backs y. And then we're basically selecting a small batch. Were creating a batch and for X and Y by taking out batches from our overall ex train and X Y. And then we're running the training sets forth and backers propagation. And that's because this is within one e pop Andi within one batch. And they were working out the training, actually, and the test accuracy. So let's review from the absolute start, because we've been through a lot and this is in no a Newell Network and a relatively good one. I must say, because able hopefully to discern Yeah, relatively well between gnomes and drinks with a very small amount of data. Okay, so what we're doing is we're importing the relevant libraries and frameworks here. Then we're creating to empty lists for our image. Data andare label data were loading in the images from these two folders with a function that basically re open the image. Then we resize the image so that all of the images of the same size, intense flow, all of the images have to be the same dimension. Then essentially, for each of the images that we open and resize, we turn them into a numb pie array that's basically put them into put them into matrices, turn them, turning the image into numbers. Then we add, essentially, we appends this image in numbers into our empty X list. And then we do the same thing with why, but we had the label, which is either no more drone turned into one hosiery. After that, we set up placeholders, which is what we how we 1st 1st put them, put the data into our network so you could recreate placeholders for both the image data on the label data. We do one hot encoding on our on our labels, So instead of just being one or zero, it's going 01 or 10 and then we're doing TF dot cast here. So we were turning this number into afloat just because sometimes we get errors. If we don't from tensorflow, then with we're doing a split. So it was splitting all the data into training and test or what we call evaluation. And then we set up the architecture of on your network. We created n'importe layer where we reshaped the X placeholder into an appropriate shape for a tensor. We then flattened out in so we could put it into an input layer, just like you saw when we were looking at the circles. And then we put it through three fully connected layers, each of them having 10 10 notes or the circles are just talking about using an activation function called really instead of sigmoid. Then we carried out dropouts with a 50% drop out rates on the final layer, and then we created something called our Lodge. It's basically logic are the is the output mayor, where we can make. We can make predictions or the program could make predictions. The model on whether it's a no Maura drained there's any two units because there's only two potential classes. Gnome or drain. So after that way defined our loss with by getting the mean of soft are soft Max Cross interpreted Lodge It's and you can think of that exactly the same as we're doing the means squared error between what are water prediction is on what the true label is, so you can even look back to move in linear regression. And there was a data point how far that was from the line. So the data point would be our one hot here. That's the actual data point. And, ah, our prediction is, though the line of best fit is our lodge. It's next to sing about optimizer so saying we're gonna be using grade inter sent in our backward propagation, and we're saying the training up for the whole purpose of the were saying your purpose as a model is to minimize this loss and then for our own piece of mind so we can see the accuracy we looked, it the billions. So we looked at whether are up. Our label is the same as our prediction. And then we took a mean of this which would between be between zero and 10 being 0% actually, one being 100 sexy se. I'm just going to run all of them up until now. What I hope you've been doing is life coming this with me. If you haven't, I recommend once you've been through this lecture to go along with through with me this whole thing so you can cut out yourself because it feels fantastic to be able to get to this point and then press the magical shift enter to get our model running. So here we go.
31. Running your model: apologies. Believing your cliffhanger there, I realized I wanted to keep this in two separate tutorials so that one Detroit would become too overwhelming. So just to recap just before we run the model, I'm sure you really want to get there. I'm I've made a few changes very quickly. And so the images 200 by 200. We have three of the frequent delays, like before, with 10 units in each using the grade in dissent. Optimizer. We're going to go for Let's go for 20 box, batch size or 50 just so that we can. If we had small batches, there would be a lot of text being reported, so we keep the batch size relatively high. I've just run all of these cells, and now, without further ado, let us run the model. As you can see, this run very fast. The reason for that is because if we look at our fully connected layer with 10 units and each, that's, um, that's very, very easy for a computer to do that, say so it ran very quickly indeed. Let's have a look of the results, so we started off, as you'd imagine, because it's a 50 50 chance when it's randomly when the weights of enemy it initialized, that's absolutely fine. You'd expect the accuracy to increase over time within your network with you. Let's have a look Well, as we go but increase the number of e books. As you can see, it stays around the 50 50 mark the whole time, which you may find slightly disappointing but have no fear. There are a few ways in which we can improve this right off the bat. So I'd say the image size festival is fine. 200 by 200. But let's wrap this up a bit. I'm going toe have over 1000 units for each of these layers now, Okay? The dropout rates. Let's just drop that Teoh point to see if that affected. I'm also going to train what we're using as the optimizer right now. We've got the great Instant optimizer. I'm going to use something called Adam. You know that one? The Adam Optimizer, Because this is turns out, it tends to be a much better optimizer. So make a note of this the Adam Optimizer and we should see an improvement in our results. Now, this will take slightly longer. But let's see what happens after we do 20 box with this. Okay, so let's take a look at the results. So as just like in the last time, the first time round, we're getting an accuracy for the training and evaluation data of about 50% which is to be expected. So one thing to really make a note of is with this kind of small amount of images like we're using 89 images is really not that many, you know, compared to, you know, using hundreds, thousands or even millions of images. You know, you cannot expect so much from on your network for images like this. But let's have a look. So we've got 50 50 then suddenly jumps up Teoh 98% in 89% because the batches air quite small. So the batches are. What do we say? 50. It's a very got 50 images so that you say if you had hundreds of thousands of images and the batch size of a larger, you wouldn't expect to jump this big, but it's still looking like something started to happen. We're still around 50 50 here, so we're coming upto around halfway with Tenney Box. And now we're just starting to see that the train actually starting to increase and we have another jump up and you can see the Actually, it's starting to fluctuate a bit closer to between 90 and 60% accuracy. One thing's for sure is that the accuracy is starting to increase over time over the number of E box. So by us increasing the complexity of our neural network and adding an improved optimizer, the Adam Optimizer we are getting between 80 and 100% accuracy coming up to the end. So just to show you that increasing the movie box is often one of the best things to do, let's just run this again and see what our accuracy tends to float around. If we run this 100 times now, Okay, I think I'm going to stop it. Just 80 box here because, as you can see, this is kind of gone stuck around this solve accuracy, which is a fantastic accuracy seeing us how little data we have. Eso there start from the top. As you can see, we start off with the 49 50% solve area for training test, Actually, which is completely nor to begin with. It takes a little while for on your network to get off the ground to start learning. But we're already a Epoch 10 here, and we're around the 17 between 17 90 80% accuracy season. See, there's definite time to pick something up, and I should mention that I chose two sets of images. One was for gnomes and one was for drones. With all of the drawing images I picked. They had white backgrounds on with all of the gnomes. They were in gardens. There was lots of green in the images. So those are the kind of past packages pattern. Sorry that the model could pick up and learn from. So if you were to for example, try and get picture, get a model that could discern between two different types of bike, for example, and they they weren't intellect, backgrounds. You need something a bit more powerful. Uh, as you can see, this is doing really well. Is upto were on 20 now and we're getting accuracy is consistently between 80 and 90 between the batches, some reporting 100% accuracy. This is re fantastic. And then it gets, as you can see here comes onto having around 80% for the evaluation, actually, and 97% for the training accuracy. And they just get stuck around the 100% on 89% for the for the test. So it may be slightly over fitting him so we could use some of ah tactics to reduce over fitting, which we talked about previously. Bit and I'll be talking about more about in the future. This error here just came up because I stopped it early. So what's important? This here is that even when we only have 90 images, if we slightly increase the complexity off our model here from being 10 units toe 1024 on we improve our optimizer from great incense something called Adam Optimizer We can see incredible results on with 100 boxes. Well, we could run this for a lot longer, So a fantastic model, fantastic result. So what I really hope that you feel confident in doing now is be able to create your own new network just like this with your own images. I think that's really important for you to feel confident in doing so, hopefully have been able to pick up the code from the lectures. If for any reason, you you would like me to post the codes separately so you can do better copy and pacing yourself and tater it for your for your model in your images, do you let me know? I'll be happy to upload this onto the course if that's the case away that's needed is for one person to say, Please come up with this. But essentially, you could easily do this with any of your images. Make sure you get about 58 50 images for either. Either two of these. You can prepare in the exact same way you set it up with with your own your network because many units you want. I really recommend you play around with this. Play around with the number of layers you could add some more layers. Increase the number of units you could look at what different optimizes there are. We'll be talking about all the hyper parameters, hyper parameters and how you can improve your model in a future lecture as well. One thing, obviously obviously to do is just increase the number of E box, so giving your model a bit more time to lend. But that's it for this part of the section on I hope you found from this useful Andi. Really. The big game for you here is for you to be able to create your own image classifier, creating our own your network from scratch. So give it a try yourself. Really once or twice is many times you like with different images. This is a perfect portfolio project, so highly recommend you create one from code from scratch. Credit from scratch include your own comments describing what's going on throughout your model on it will be a fantastic portfolio piece, so I wish you the best of luck, and I will see in the next one.
32. Using other frameworkers: So I've been asked a lot recently about different frameworks people want to know. Is Tensorflow the only one out there that's actually good? Andi, my answer right way is no tend to play isn't solved. The only solution out there. Actually, there are a number of different, really good frameworks. I just want to show you a few of the ones that you may be interested in checking out yourself. So in terms of using python for deep learning, caress is really good as well. Actually, caress is a lot easier to write out S so I think the coding is a lot more efficient. Andi cleaner. So in that way, caress is better. The reason why I always recommend for people to get started with Tensorflow is that you have to learn the ins and outs with tensorflow. A. I mean, people could use just python on its own, but I would take thousands of lines of code. So tense plays a really good way to get into understanding all the ins and outs of deep learning. And then, from there, I recommend people check out caress simply because the code is is shorter. It's a bit easier to work with an easier to debug as well. The only problem between tensorflow in things like Paris So care ass and I'll be describing a few others Essay pytorch, for example. They actually sit on top of tensorflow, so they're already using the tensorflow back end anyway, So they're simply just an easier way of using tensorflow a. The cons that unfortunately, you can't you can't get much flexibility with these days. You can't intensively, but for most purposes it's enough. So I would say caress is a very good one to check out. Pytorch is also a very good option, and it's really good for I'd say, its main strengths. Aaron debugging and prototyping so you can actually get a neural network up and running really quickly on bond. Yeah, like I say, it is a bit more. If there's bugs off this areas in your code, it's a bit easier to find out what's going on. Tens of low isn't exactly the best for that. I want to emphasize the reason why I'm teaching Tensorflow in this course is because I feel like it teaches you the most important skills and integrates the most important understandings so you have that background understandings that when you go on to using care ass or pytorch, you'll be a lot better off for it. That's my opinion. So I recommend you take them out, and I think you can install them both through Anaconda. So if you just put Kanda install care us, you should be able to get Cascade. So the one that I thought would would be worth mentioning is mxnet. So this is part off Amazon and their gig making real push for this language simply because they want to be part of the action. Said say, One of the big prose that I've heard off with Mxnet is that actually runs faster than any other deep learning framework. So allows you to run your your models faster over the issue. Is that because it's a much newer language? As you can see here, it's versions. 1.0, is the top one up here. The community is much smaller, so it's a lot harder to problem solve and debug and go online for help from other people. So I would say it's still got a lot of growing up to D mxnet, but maybe one to keep your eyes on for the future. For now, if you're looking to experiment with other frameworks, I'd very much recommend checking out care ass and pytorch.
33. Loading handwritten digits: So we're now going to create an image classifier using the framework caress, which we've talked about previously. So essentially caress is a very useful framework, and it's a fine alternative to tensorflow. Caress is actually built on top of tense place where actually uses the tense for language, but it simply makes it easier to set up models in a simpler way. You lose slightly your flexibility and what you condone. You know the hyper parameters you can choose, for example, and but the usefulness is that you save time, and it's a lot easier to use caress most people find on. And for the most part, you're not going to be creating your network so complex for quite a while that you'd need that flexibility from tensorflow. So it's definitely alternative to check out. That's why we're going to do one project using caress. So first we were importing numb Pires aways, and then we're importing a few things form from caress, so this part's important from care start date sets import M n I S t say what The problem we're looking at right now is we're going to be looking at loads of unwritten digits on we want to be able to classify them saying right. Okay, this is number one. This is number eight, etcetera. So those 100 digits a court? M n I s t so Carris already provides the state set for us. I thought because we've already had experience in removing the tense play image classifier , we've got that experience now loading up the data of the images. We can just use this inbuilt date set. For now, there were a few, including a few other things. So carris models imports sequential. That's really important. That's basically your first way to start setting up your in your network and then the dense layers, the same as in tensorflow. It's a fully connected layer and then applying dropouts using this and then there some useful utilities. And of course, we're gonna be using map lot lib to do a bit of visualization. So I must say, in this model, we're gonna be setting up just 11 layer in your networks. There's only have one hidden layer that's important to keep in mind. What's important here is that we're using a lot mawr data a lot, lot more images than we were for the previous image classifier. So I think you'll be happily surprised by how effective just a basic neuro network can be when you have enough data. So we're importing all of these things on, and we don't need to do this twice. So from the immunized T, we're gonna be late in the data and putting them into X train. X y train X test. Why test eso caress is quite cool. Just do this automatically on that. I want to print off a few things just to show you something. So the X so when we print off X train, it's gonna be shut showing us how many images we have. A neck strain was gonna be showing us the heights on the width of the images. So let's just run this first. So it says immediately using tense playback end because Paris is built on top of pensively . So we have 60,000 images in our train sets and the images are 28 by 28. Now, let's look at what the shape is of our tests. It you can see we have 10,000 tests it. Okay, so what you might have expected was four dimensions like we were used to in the previous image classify were using because we had the number of images or the batch, the height, the width and then the depth because of the RGB. Say what I will show you now is that all of these image images are black and white or what we might call gray scale. So he only has black or white, which means the depth or the number of the work with the debt. Because that's what I've been saying. The depth is one instead of three because there are no RGB values. It's just black or white, which could be just a one death. So it's not included here and then which wants to make sure to show you that extreme dot shape one is the heights, and the next one is the with. So the number of pixels in the whole image is going to be 28 times 28. So then I just set this up. So, essentially, I'm creating a function here where I where we literally just look at at what what image there is here. We're basically just choosing the color map and we're gonna have is gray. So the index here, I mean, the next train and number of this will be just a different image. So with basically saying printing image and then also give us the label says you can see just underneath it is a nine. So weaken putting in this 22nd 1 here we put in any 80 Well, we want give a different number eight. So nine etcetera So is we've got lost if mumble numbers and their correctly labeled So that's this. Essentially, how we laid in the data is nice and simple visualizing the data and the the image on the label here. And then we're just going to look at a few more things before we move on to the next lecture. So, like I said, the pixels, the number pictures is the training shape one first index times trained shape index to as you might remember, we did it just up here where we're looking at the height and the width. So in a fully connected layer, obviously we want all of them to the the first. The input layer toe have all of the pixels basing one vertical row. If you imagine it like all of those nodes in the diagrams we've been looking at. So then I'm basically just saying X train reshaping its s so that it's ready to be input. So making into this stretched out thing that I was talking about with all the nodes in a vertical line, and then we're doing something called normalising the data. We just makes it a lot easier for a new network toe work with, because we know that when you're describing pixel colors or intensities of black and white , the numbers can range from 0 to 255. So if we divide all of our values and extreme next rest X test by 255 that's gonna make over numbers between 01 and it makes it a lot lot easier for on your network toe work with on Daz, you'll see we're running your network later, and I can also remove this normalization or if hopefully you're creating along with me in typing these in, you can try running it with and without, and you'll see a huge difference. Okay, and then we're using. When we imported this np dot nbn score, you tells. So we're basically just doing one hot encoding here, putting them into categories which is basically just one hot, including so we're one hot, including ex train X test. This is how you do it in caress. And then we are defining the number of classes, so that obviously, uh, we were looking at this before the snow. We look to this yet, so essentially, we're looking at the shape of White Test First Index, which is going to be the number of different potential labels. It could be because the number of labels are between zero and nine. That's gonna be 10 classes we have overall, because we have the number 0123456789 we those of the hundreds and digits that we're looking at Say, I hope you found this part useful. Andi, In the next next lecture, we're gonna be going, going over setting up the actual neural network architecture
34. Creating the model: So now we've loaded in our data for our artificial neural network That's going to be classifying 100 digits. So another bladed in the data What we want to do next is create the architecture for on your network. And do you see these beautiful six lines of codes minus the two lines for comments? That is a lot that you will need to set up. Your neural network is beautiful, isn't it? So we're going to be doing this within a function I've justified. I've decided to define it as model. And then what needs to do is say model equals. So this isn't the same as the function. So I could put I could call this something different if I wanted to. But I'm just Ah, yeah, I probably should call it something else. Let's just put let's change this function, Teoh create model that makes makes it a lot more, makes sense of it more so I'm just gonna put under here just to make sure update it. I don't asking confused. So our function is called create model. So then I'm basically creating a model variable very was called model, and I have to put model equals sequential. Okay, so you don't worry too much about this is basically just saying right, we're now setting up on your network and then we can add layers to this. So this is basically the ground foundation, and then we can add the bricks up so model so we could basically add layers by just using this variable and putting dot ads. And then in brackets, we put dents, which which is our dense layer. And then we're putting in pixels. If you remember in the previous lecture were to find pictures here, which is basically our images, but put into a 784 length vector 784 is 28 by 28 or the height height, times the width of our images. Okay, so our pixels is basically a stretched out version off our images, which is the same as what we did in tensorflow as well. So the input dimensions are going to be the same. And then we do Colonel initialize earned normal, and the activation is really say activation could be sigmoid, which was talked about before. But as we've also gone over with activations, sorry. As we will be going over with activations really is a much better option than sigmoid, except for in the final layer oven your network. Sometimes this time it's is important to understand that sigmoid actually isn't all that useful for most times. For apply into your network, we use something called Really on. We will be going over this in more detail. The different activation functions in her future later, so is what is this? What we need to do is add a dense there for the classes or for labels so can initialize the normal. That's fine. Usually just tend to put that on. Then the activation. We use soft max, which we've talked about before, essentially because we have it's multi class classification. So there are. It's not just a yes or no buying request vacation. It's a 01234 etcetera. We used soft max, which basically, as I said before, takes all of the numbers the summer off, all the probabilities off each of those being 01234 and make sure that all of them add up to exactly one. So the probability is 100 on, then the output could be what probability. It is that the image is each one of those different digits. And then at the end, we just need to compile the model, define our loss. So cross entropy again like we're using intent to flow. Adam is a type off grade inter cents. That's very useful. So we're gonna be using Adam. And then the metrics were we we want Teoh. Look at is accuracy and then simply we just say, return this model at the end. So we've gone through by setting up the groundwork here, adding on a layer for the pixels and for the classes. So that's our X and Y and were using a real activation. And then we're also for the images and we're using soft max activation for the output labels. And then we're compiling the whole model, and our loss is using categorical across entropy. We're using optimizer type of grading descent called Adam, and then we're looking at metrics for accuracy. So I'll run this and then in the next lecture, now that we've created the architecture now we can actually run the model itself on the data, so I'll see you in the next later
35. Running your model: So finally, now we're going to look at running our model and feeding in the data. So if you thought in the previous lecture setting up the architecture it was a small amount of codes will check this out. Compared to tensorflow, this is absolutely tiny minus the comments. This is only four lines of code you have to use to run your in your network. It's absolutely amazing. And this is why I like carrots quite a lot. Eso let's take a look. First of all, we're creating something called Model here, just a new variable. And we're using the create motor function, which we created in the previous lecture. So this is our architecture. And then we're saying to fit our model with our data, which is X trained. Why train? And then we're saying, What are validation? Data is which is X test nonwhite test. We're defining the number of a box, the batch size and the verbosity. So verbosity over boasts, it basically means how how detailed ji wants the output to be eso is giving you how much of experiments explanation do you want? So in general, verbose equals two is absolutely fine. You can go with that. So finally, what do you want to do? We want to get our scores. So we calculate the score by evaluating the model using our X test and our white test. That's our evaluation data, basically. And eso what might get quite confusing is many different frameworks. People they they saw interchangeably use validation and evaluation. And here you have test. So what we call evaluation, it's probably the best. That best way to call it according or my Reese, all my research, but in code, we tend to put test Just saying eso Yeah, we want to evaluate the model based on just the evaluation data to get our accuracy, which is a very good idea on Denver Bust equals zero for this. And then finally So what's gonna happen just from these three lines of code, which is absolutely amazing? You're going to see it and you're gonna love it. It basically runs, runs the whole model for us out putting how the accuracy is a each and every iPAQ and give us a few more scores. Which Aiken talk you through as well. So this is right in the end now, want has gone through the whole thing or Tenney box. We're gonna print out final error score. So you want to see what the final error is? So basic. That's gonna be, ah, 100% accuracy minus the actual accuracy. And now their loss. So hopefully you're usedto quite seeing this. And python. This is just basically creating placeholders in your text. The dot to F air basically means the number provided to two decimal points. Okay, so then we're so what, we're gonna be paying his final. We're gonna be seeing this final error score on then. This are actual Eric Final Aero school, which is going to be 100 minus. Whatever our accuracy is from the final e book. So that let's just run through everything we've set up for this neural network. So first we imported the relevance frameworks and libraries. We loaded up the Eman honesty data. We looked at the shape of the data by just being ex trained dot shape, which is important. We saw that we had 60,000 images to work with, and the images were 28 by 28 height and with we then took a look at the data using using map plot lip. So we imported. Matt is something called Matt blocked lib dot pipe lots and the important it as p o. T. Just to make it easy to write. So what we did is we looked at anyone image here, So extreme index. And then we looked at the White Train Index to see the label, and we saw that they were the same. We go to the shape of our data, and then we looked to make sure the labels the same as the images, then we to find our pixels, and we defined our ex train on our X test. We've actually put it into the correct formats by reshaping it. So it's no longer height by width, its height times with for the pictures and then we normalize the data. So instead of all of our values being between zero and 255 for the pixel values there now between zero and one with ended one hot encoding so turning, why is not into zero? 1234 But if it was a one, for example, it would be 0100000 and so forth so far and so forth. I hope you remember from we did a lecture on what? On one hot encoding. If you're not completely sure, I do recommend you go back to that lecture just to brush up on this. So then after that, we looked at creating our model. So he defined a function here Essentially that did all the work for us created the full architecture. You always start with this sequential and then you add layers to it. We're adding just one layer, one dense layer which is just a fully connected layer where we're importing our pixels. So the X values which normalized and the Y values which of the classes we are we are using kind of initializing normal, which is fine. You can always just do that activation it goes, really? So instead of using a sigmoid as activation function, eaten each node using something called really, which will will be going over in more detail in a future lecture and for our why values were using soft Max to define the probabilities that the image is anyone off the 10 different digits. Then we're combining our model defining our losses. Cross entropy using an Adam optimizer which is a type of grading descent, were saying, we want to measure the accuracy and then the end return the model. So in this final part here, we're basically saying, would find your variable, which is this function. And we want to fit this function using our data to find number eight box the the batch size on the verbosity we're defining while scores will be. And then finally saying to print the score at the end. So with no further ado, let's run this model. Okay, Accident. So now model has run for 10 E box. We have some output on the output. Looks incredible. So we've gone through the box. It tells us how long the pop has taken 11 seconds and then for our training data, it tells us what the losses hopefully remember what the loss is. We've been over in lectures. When we were talking about linear regression, the accuracy is 92% to start with, which is okay if we look at our validation or evaluation loss, the loss is actually lower, which is fine. Andi, accuracy is higher, which is absolutely fantastic. Eso as you conceive with both of them, which is what you always want to see. The loss is decreasing and every single time the accuracy is increasing, so things are looking very good. So it's not what we what's really important is that evaluation or a validation lost is decreasing and accuracy is increasing. That's what's really important here. Um, so our final accuracy is a whopping 98% accuracy. The final error, then, is essentially 100 minus this. And so, as you can see, we have 1.79% error, which is absolutely fine. So as you can see, using just a one layer in your network on simple data with a lot of images, it makes the world of difference. Eso I did say that we were going Teoh quickly take a look at what would happen if we didn't normalize the data. So if you remove this so we're no longer normalizing data, you're going to see quite a significant change. So let's run it now. Wow! So as you can see here, it's a much different story. When you don't normalize the data, we start off with just a 53% validation accuracy, and it very slowly struggles up on down, sometimes finding, getting to 58% accuracy and a whopping 41.27% error. So that just goes to show how important normalizing your data could actually be. So what I recommend you do eyes now that you have been through all of this together is relatively small amounts of code. Here, you can play around with a few things, play around with the number of e books. I'm quite confident that if you are, if you want to make sure you normalize the data and you increase the box to, let's say 2030 maybe even 100 you will probably get even better accuracy. So there are a number of different things you can play with the activations. You could look at what would happen if you use sigmoid, for example, Um, and change how? What would happen if you changed the optimizer? Things like this eso you can play around, and now you can start stink with and see how you can improve the accuracy of your validation or evaluation data on by what might make it Wes. So what? It turns out there's another type of your network called convolution on your own networks on these, actually even more effective with image classifications. So I'll talk about those in a later lecture. But for now, what I recommend you do is try writing this out yourself. This codes playing around with all the hyper parameters on. When you're when you're done with that, I will see in the next lecture.
36. Hyperparameters: Let's not go a bit deeper into hyper parameters. So hyper parameters are all the different aspects of our model that weaken change ends, move around and play with in order to make on your network as good as possible. So let's start with number of hidden layers. So with number of hidden layers, that's essentially the number of layers of nodes. So in our drawings it was the vertical strips that were then on your network. So so here we are now within your network, when we're talking about the number of hidden layers, What we're talking about are all of these. This one here is one hidden layer. So in this diagram here we have three different hidden layers. Essentially, that's a collection of notes and nodes or simply just a linear regression with an activation function. They're taken from the previous input. So here we have three different in layers. So as we just said, the number of notes is simply the number of those circles within any given layer. So a node is a mixture of a linear regression inactivation. Next up activation functions. This the choice of what non linearity activation function use and you can change it between the different layers. Eso often you'll use one, maybe two different activation functions in a model, but it has been known for people to use a large range of them, depending on what problem you're looking to solve. So the learning rates is what we're talking about with the back propagation on and with the Grady in dissent. So the learning rate is basically and they people define it as Alfa Alfa. Is this sign just like this? And when we're talking about the learning rate and back propagation, reduce telegraph. If this is a graph showing the learning rate, something like this, we used ingredient descent to find this minimum down here. If you have a really high learning rate, it's gonna update the weights drastically. So probably go somewhere over here when it first updates and then it will go again is basically the size of the steps it takes when it's updating. If you have a very low learning rate, it will more likely go at this. The batch size is Aziz have been over before, is how much data you put into your neural network in one day. So, for example, if you're working with images and you want to train your your network on 10,000 images. Unfortunately, you can't just tell you on your network to go through all of the 10,000 every for every four and back propagation. You want to do it in small batches because it make it makes it a lot easier. Computational E. So we might put put in batches of 100. So we train our network on 100 images on. Then it would update its parameters based on these 100 images, and they would feed in another 100 would keep going like that through the box. Speaking of e books. So the number of the box is basically the number of times we go through Ford a backward propagation. So when we have been through one set of Ford propagation and then back propagation, that's an iPod. So in order to train ah, your network effectively, it depends completely on the project, but I've been known Teoh run them for about 50 to 100 box to get some really good results. So those are the main. Her hyper parameters have been up against so far. Next will be talking about variants and bias
37. Bias and variance: so we're now talk a bit about bias and variance. We've actually already gone into this in quite some depth, but using different words, for example, over fitting. You might have heard me say that quite often. So what is barrier bias and variance? So bias is under fitting and variant is over fitting. It's a simple as that, but we will go through it just to make sure you've got the intuition behind it. So with if you have high bias, then essentially, your model isn't picking up the patterns in the data very well. Know enough to detect important patterns and to make accurate predictions. So when you're under fitting, you're most likely going to get bad accuracy in your training and bad accuracy in your evaluation data with variants. If you have high variance, you're over fitting the data, which means that during training, your model has abstracted so much complexity from the patterns in the data that if any new data is brought in in the using the evaluation data set is not going to do very well. So when you have high variance or over fitting, you're going tohave. Ah, high accuracy with your training but lower accuracy in your evaluation. So now let's look at this with a few graphs. So with over fitting, what would happen? They started under fitting, actually, So we're going back to a good old then your aggression examples. Let's say the graph. So we have a few few data points. So if you're under fitting it essentially, in reality, it may look something kind of like this. So is currently, if we're doing extreme extreme example in my looked like this, where it's just completely not being up any patterns in the data. But more realistically, it's gonna look something like this where it kind of goes through the data but really doesn't connect with me in the data points, and we're gonna have a high loss with both training on evaluation. If you've got over fitting, it might look a bit more like this where it really is very close on the training data. It is gonna have a high accuracy because it's meeting most of data points quite well. But as we introduce more data points in our evaluation date set turns out, actually, it hasn't picked up the patterns in when there's more data around So you're gonna have really high losses, for example here, and so you're gonna have a low accuracy. So now let's look at some actual examples. So, as we said with high bias, you're gonna have under fitting with high, very intriguing over fitting. So I'm going Teoh give you a real life example, and then one that kind of just makes intuitive sense for you. So starting off with driverless cars, for example, with this is with under fitting, let's say you have a driverless car on it detects that roads that anything that is black and has white lines, it's a road. And so it's just under fit to that kind of data. So if it's Isam zebras, that's not going to be very good, right? So that driverless car is not working well enough. It needs to detect more patterns in what a road is. For example, it sends have very, very straight lines. It's usually doesn't have four legs, for example, so it's under fitting the data. Let's say you're a school, and the teacher says, What did I just say? And you say something about school? Well, that's very much under fitting, what I'm sure the teacher was trying to teach you. Now here's a real life example of over fitting. So the U. S Army? They wanted to create some artificial intelligence that could detect a tank. So they gave loads and loads of images of tanks. Unfortunately, within all of the pictures they gave the tanks. It's while they're in forests. So if a new image was introduced where a tank wasn't in a forest, it wouldn't detect, it was a tank. So that's over fitting to the data that had where over fit and thought, right. Anything with lots of trees. Okay, that must have a tank in it. So let's say, for example, my girlfriend says she does. She doesn't like a movie, so I I wouldn't exactly just say OK, so she doesn't like movies ever. We're never gonna watch a movie again. It's over fitting. Tow her statement that she said she doesn't like movies, so there's a few intuitive examples for you on Next up. We're gonna be talking about what strategies we can use in terms of reducing bias and variance
38. Strategies for reducing bias and variance: now that we've gone over. What bias and variance is less like a few strategies for how to reduce thes When you get into doing practical projects with new or networks and others, you'll find that these are very common problems, so it'll be great to build a basic intuition for them, so you can use this in the future. So first of all, it's talking about reducing bias, reducing under fitting. So usually the problem with bias having low bias. Ah, high bias. Sorry is that your model isn't getting enough complexity from the data is not drawing enough patterns from the data so simply some of the best ways to reduce your bias is by increasing the complexity of your in your network. So in order to extract more information, more patterns from the data so you can do this simply by playing around of your hyper parameters in terms of increasing the number of hidden layers you have on the number of nodes within those layers, that would increase the number of weights and the number of connections on, therefore increasing the amount of complexity that your model is able to abstract. Another option is to increase the number of epochs so that your model has more time to train on the data. Sometimes it's a simple is that you might have run your model for, Let's say, 10 e box on. Do not yet have you got a great accuracy on either your training or your evaluation. So the strategy here is just to increase in the number of E books to 50 and let your model run for a bit longer and see if that improves the accuracy for both your training and evaluation data sets. So how about reducing variants reducing ever fitting? So what you want to do here is actually the exact opposite for complexity, because obviously, what's happening if you have high variance, one of the large things would be the model is extracting too much complexity from your data , which doesn't leave room for the patterns that might be in your evaluation date set and in the real world. So you might want to consider reducing the complexity of your in your network by removing some hidden layers or reducing the number of notes in those layers. Also, what might be the problem is that your model is getting all the patterns from not enough data, so it needs more varied data to pick up on. So what I'd recommend is considering augmenting the data, so I'll go over that in a moment. But basically we're augmenting the data. Is is taking your existing data, changing it slightly and then using both of those pieces of data to train your model on. So also, another option is to drop out to your layers. Let's talk about dropout. Drop out is a very funny thing where most people would think it sounds absolutely crazy, but it works. So drop out is the process of randomly removing a certain percentage of nodes from from your Ford propagation past. So what I'm saying here is that when you have your neural network essentially any one of these layers here in any any given layer within these you simply when you're doing one of your Ford. When the model is going through Ford Propagation and one of the passes, it would just absolutely ignore a certain number of notes, which sounds crazy. But I've used on models before dropout rates of 50% of quite normal actually. So each time you go through Ford propagation you render the program randomly removes or ignores 50% off the nodes, which sounds absolutely crazy. But it does actually help, because adds a certain element of chance as to what patterns are being extracted. So what happens is exactly this you have in your network Now that just every four pass it ignores some of the notes, and each time the Ford Propagation occurs ignores completely randomly. So maybe complete, different notes that were ignored at another point. So dropout rate is something very important to keep in your mind. So now let's talk about augmenting data or use a few examples because I think that's the best way to understand how it goes. So augmenting data is basically taking the data that you have changing in some way and then using both the original data on the new data have changed. So, for example, in voice recognition, what they usually do to increase the amount of data they have is take, for example, let's say you have 1000 bits of audio. That's our of someone speaking of different people speaking. So what you could do is increase or reduce the frequency. So if someone's talking at this frequency, you could change it. So it's talking like this, or it's talking like this so that you have a range of frequencies to play with. So you have the exact same audio, but just a different frequencies. You can also try adding in random delays in the audio because of real life. People do take random delays in this. They pause for a moment. One thing that's actually been proven to be very useful is adding background noise. So white noise. You adding cars in the background to delay volume to the audio. Things like that have been shown to really help with increasing amount of data you're using . One more example. Here is an image classification. This essay. If you imagine you have loads of images, 1000 images, what you can do so you can change the position in an image secret. For example, a picture of a cat up or down, left or right. You could edit the angle so you could tilt. Tilt the images slightly a Z you can see here even turn upside down. You can add different filters and different colors, and you can also apply things like blurriness or increase or decrease the lighting. All of these things are really legitimate ways to increase the size of your data by a significant amount. So I recommend if you are experience over fitting one thing, it may be that it never hurts to have more data. Andi, often over fitting, can be helped significantly by increasing the data and augmenting the data. Where possible is one of your your best options would say so. These have just been a few of the tactics you can use, but the main strategies that are used in practice in the real world, when you're when you're dealing with either under fitting or over fitting. So I hope this has been helpful well done on completing this section, and I will see you in the next one.
39. Activation functions: Finally, I just wanted to touch on the idea of these activation functions, which I mentioned earlier because in the code we were using quite often really, which stands for rectified linear units. We've already talked about sigmoid before, and I want to bring in tanager as well, which might be good for you to know for the future. So in sigmoid we're talking about, it basically brings any number between zero and one, right? And so that's quite useful, especially when you work with probabilities. But what we find in reality is if the numbers are outlying like they're too high or too lay when it comes to grade in descent, there's not enough of ingredient for it to be updated very quickly on what you can do. You can get these these problems where the numbers get stuck out here and you don't get much actual learning going on in the middle. So what's good with really is that as you can see, it doesn't curve off a tool, and it and say allows your model to learn a lot better, which is get, and if you have any negative numbers, then it just ignores them completely so That's one of the one of the cons of really is that it loses that gray didn't completely over here, but in general, that's no problem. So I tend to use really pretty much everything and tan hate. Your hyperbolic tangent function is useful in Ah, no of different types of your networks, such as recurrent your networks. The important thing to note here is it goes from between one and negative one. They don't have any applications. Really. We don't tend to use them in artificial in fully connected your networks. But it's good for you to have that in your mind that this activation function called Tan H exists and it's used in MAWR complex, New or nets. So that's it for this section. I hope you have learned a lot on nasty in the next one
40. What we have covered: So you just a quick election now to go over all the things we've covered in this course, You've done an incredible job of making a way through a great deal of concepts and even the applications using things like tensorflow and caress. So I've been through a great deal of things. I want to wrap it up by going through a quick landing summary. So to begin with, we went over the basic theory. We were talking about it in terms of linear aggression, ex and why, If we have some X data points on we have why, as the outputs, how can we make future project predictions using Viniar aggression and creating a line of best fit? We then talked about how over fitting could be a problem if we put our line of best fit through all the data points without considering future data. How that would be a problem. So that led us to talk about training sets of data and evaluation. So we use a part of our data, most of it for training, and then we use a small amount of our data for evaluation to make sure that we're not over fitting on our data. We then talked about how we calculated the cost and the loss. So that was basically talking about how far away are predictions were from the true values off our data. And then we're talking about using back propagation and changing the line of best fit to minimize our loss on both our training in our evaluation data. Once you've got over the basic theory we started to talk about in your own networks, we talked about how linear regression fitted with in your networks and how we could create the architecture of neural networks using not only linear regression notes but including in that using activation functions. We talked about how we could create many hidden layers off these notes, and we could increase or decrease the number of nodes with in a lair. And then we talked about how you could use just one day two inputs, or you could use many, many different inputs in or nodes in your input layer. We then went on to talk about deep learning frameworks and tools, so we set up on the computer on your computer using Anaconda or Pip, how to install of the frameworks and libraries and we got we got hands on with using Jeep to notebook for coating on the command line for sometimes running our code. So then we now that we had all these tools installed and we start to understand how to use them. We started to look into how you can use to major frameworks, tensorflow and care ass to load in the data, which is which can be quite complex. Then how to create the architecture for in your network from scratch and then running the model and reports on the results of the end. Finally, we looked into fine tuning. So how can we change the hyper parameters? In order to maximize the effectiveness off our models? We saw the importance of using large amounts of data on, especially in using normalization in the process to improve our landing. So that's everything we've been through so far. So congratulations and the next sexual would be talking about how to continue progressing effectively
41. How to continue progressing: so a quick point now that you're at the end of the course on how to continue progressing, you've learned so much, and you've under comes on to understand so many different concepts. That would be a shame for you to lose that momentum now. So here's my challenge to you, therefore, challenges. First of all, I highly recommend you check out all the online Resource is they're a great deal of interesting tutorials out there on on YouTube. There are lots of cool machine learning engineers showing how they're using your networks to do amazing things. So I recommend you check out some online resource is next of all, possibly the most important is start creating your own models. So if you think there are, there are some problems out there that you want to solve using your networks, I say get started today with trying your best t create the solutions. Using your networks, you come across a lot of challenges as you do on that, in my opinion, is the best way to learn. I highly recommend in orderto keep up this momentum to connect with others, So in this course, it's a perfect excuse for you to connect with other people who are interested in creating your networks to solve problems. So I highly recommend you leave a comment here in this lecture to see if anyone else is interested in connecting for the future. Finally, I'm gonna be creating more courses. The next one is going to be on convolution. Your networks said there was a very, very useful in machine vision and also an image classification and maximizing accuracy on. Also, I'll also be creating future courses on MAWR in depth, different types of new or networks as well. So I recommend you to check those out as well. But those of my changes. So give him a try today Maker notes of maybe one thing you could do for each of those four points, and I'm confident that that will really improve your progression going into the future.
42. Thank you: e. I just want to say a big thank you for taking part in this course. It's been an absolute pleasure, providing the information and leaving you through, creating your first ever neuron networks only find of great use. And if you haven't had a moment yet to review the course, I really appreciate it. If you could leave a review and let me know what you think of the course, Andi help future students find out what the courses like actually on the inside. In the future, I plan to provide a lot more courses building on the foundations we've created here with neural networks. The next one is going to be on convolution neural networks that involves machine vision and image classification. So I'll leave a discount coupon for my students and the comments of this lecture. Andi, on this videos that you can weaken enrolling that course if you say please. So once again, I have really taken a lot of use out of this course on Thank you for participating, and I hopefully I'll see in a future course