Artificial Intelligence & Machine Learning with Unity3D - A.I. learns to play Flappy Bird | Sebastian Armand | Skillshare

Artificial Intelligence & Machine Learning with Unity3D - A.I. learns to play Flappy Bird

Sebastian Armand

Artificial Intelligence & Machine Learning with Unity3D - A.I. learns to play Flappy Bird

Sebastian Armand

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
15 Lessons (1h 7m)
    • 1. Intro

    • 2. Set up of the ML-Agents Toolkit

    • 3. What is a Neural-Network?

    • 4. ML-Agents Key Components

    • 5. 3D - RollerBall Project Overview

    • 6. RollerBall Set up

    • 7. Training the AI

    • 8. Flappy Bird Project Overview

    • 9. Explanation of the basic Scene

    • 10. Set Up of the ML Agents Components

    • 11. Training FlappyBird

    • 12. Self-driving Car Overview

    • 13. Explanation of the basic Scene

    • 14. Set Up of the ML Agents Components

    • 15. Training the Car

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

This crash-course is about machine Learning & Artificial Intelligence with Unity3D.

Why using Unity3D for Artificial Intelligence?

Unity3D is the perfect environment in order to train your own AIs. Let’s take the example of a Self-driving Car. What you need is complex environments where there are a lots of realistic physical interactions. You could provide these datas from interactions with the real world, but this is extreme inefficient and time consuming.

Since games become more and more realistic you can provide these informations from virtual environments. And for that Unity is perfectly positioned.

So, no matter if you are a game developer who wants to create AIs for games or if you are a hobby researcher who just want to play with machine Learning … The ML-Agents toolkit is the perfect start in order to create your own AIs.

What do we learn in this crash-course?

This course is structured into 4 major sections:

  • Introduction
    This section covers everything in order to get a quick start with the ML-Agents Toolkit. You will learn:

    -Set up of the ML-Agents toolkit with Tensorflow
    -What is a neural-network?
    -The Key Components of the Ml-Agents toolkit

  • 3D Roller Ball AI

    This lecture will give you a first impression of the Ml-Agents toolkit in practice. You will learn how to set up the environment and all the necessary components in order to train the AI.

  • A.I. learns to play Flappy Bird

    Instead wasting your time with playing this game, we will code our own A.I. that learns to play Flappy Bird by using Reinforcement Learning.

    After training the AI is able to achieve an unlimited score in this game.

  • Self-driving Car

    The Self-driving Car is the probably the most famous example for Artificial Intelligence, so we will cover this as well. To train the Car we will use a technique called Imitation Learning.

    Imitation Learning is special, because this method uses the inputs from a human Player in order to train the neural network.

Meet Your Teacher

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Intro: Okay, now let's continue on roof modifying or Flappy Bird Clone birth. The ML agents components. First, we need to create Academy Game object numbers. For example, Flippy Broad Academy. We need here on the same procedure a Z we did for the Roll a Ball project. Create the name space Emel agents and extend a new class Roath academy on the brain's fordo. You can find two New Brace and I already created one flippy player brain for testing the settings and one slippy learning brain important hearers. And we have a space size off six because we were set up in just a few moments. Six. Operations for the Flappy Bird under Victor Action. We have a space size off to because we have two options. Flip or no flip, and you can find here the settings toe flip the broad. So if you click the left most button and the same settings and we will use for the learning brain next, let's at the Prince toe. The academy at the Floppy Learning and Flippy Player and I also want toe customize some values and from the academy, the target frame rate to 30 and the time scale 10. So the neural network and will be trained at 10 times speed No lets it all the key components, like operations and actions, toe the birth script. So first a to the agents name space. Next, I created helper function in order to detect the closest pipe. So first and be safe in the pipes array, all the pipes and breath to take gold. Then every loop through all the pipes riff the if statement and we ask if the exposition off the pipe is bigger than zero. And if the exposition smaller or records three. And the three, um, assessed value I tried that fits the cross him. If these two conditions are true, then, um, evil or raise return the closest pipe. No, we can create the operations function. You need to extend the flatbreads corrupt off Christmas. The agent. Okay, so developer, right in dysfunction. Six operations first and develop through of the right position off our agent and the cross abroad can only move up or down then, similar to the gullible project. I want to observe the why. Hello, kitty. Value off the agent Next. Every person here the via position from the close pie paper function. We will observe the Putin most position off the current top pipe and to top most position off the current on button pipe. And at last, and we just observe the exposition from the closest part. Okay, next, let me paste your in the agent extra mess it and he's goto the up their message and remove here the following code because we will controlled Britain. Noah Mercy Agent action message by default. And we reward the birth growth 0.1 float because and the goal off the broad is it toe have no collisions on this is the case. If nothing happens in the scene, so by default, then if the victor action air a zero, that means no flip, um, reset a 1,000,000,000 slept toe fault. Otherwise, if the victor action every is one every set, it flip true and we call the flip method. So the birds and short flip. Finally, if the step William is true, that means something happens. Sort of broad has a collision with any collider. Then we will punish the Perote roof minus one entry reset. Everything Ruth done. The agent recent message is no. The reset all message. So just let change your name. Okay? That's everything for the set up. So no, we can testes. But first, I also want to print out the current reward value. We can do this, for example, off the following line. Off coat. So let's test. It's no. Or if the play up rain shows the player brain under flippy broad Agent and click play. Okay, this looks cool. The imports are working and set report functionality. You is working as well. So in the next video, we can start to train the agent. 2. Set up of the ML-Agents Toolkit: Okay, First of all, before we can start, we need to. Don't Lord the unities. Emily. Agents project. You can find us on. Get up under Unity technologies Emel agents. So just don't load is the whole project click here? I'm clone or don't know. Next, I created in your folder machine learning I removed the Emel agent folder under this new folder. And let's rename Unity sdk toe Emel Agent Because now we can open the folder riff Unity. She was here You are directory and this animal agents project will be buys, right? Basic set up for all of our projects Rework rate in this course. Well, no, I One point you tow this installation guide from Unity, you can find this under Emily Agents, Dogs Installations here is a detailed and guide to set up the two kid for Mac and Windows. All steps are really well described here. You just need to follow all off the steps. The reason why I don't want to show you all off the steps in this video is because unity is developing their family agents tweak it really fast and therefore this video would be out off date and probably just a few weeks. But this guy here is always up to date. In short, if you have set up your environment and when you have don't loaded fighting, then you basically need toe do following. First important is the version off fighting. Currently, you need to download fighting 3.6. But depending on when you are looking this video, then you maybe need another version off fighting. Then you're simply need toe open euro terminal. You need to open the folder, we just download it. And under this project, you need to open the folder. Emily Agents. No, you're Ruskin and right this comment This should don't load everything you need. If you were haven't received any several messages, then you should be able to run now. Emma, Agents learn help, and you should see this message as a rice. If you received an error message, then you need to shake the steps before. Okay, If you have done this, then let's set up the basic seen on order to shake. If everything is working correctly Well, the basic scene is under Emily agents. Basic Seems basic. First of all, just let's click play on order to see what we get. This is a really simple game. Rieff. You finished trained agent. But the goal off the agent wants to find the position off the large three year the Emily agents to wicked misusing here, a technique called reinforcement learning. In short, if this Method three agent is rewarding, refer positive value off one. If the agent confined the position off, the large is here. And if the agent and just confined the position off the small fear, then it will be reward for less positive value off 0.1. And depending on this reporting radios, the neural network behind the agent is trained but more board reinforcement, learning and neural networks. We will see in the next videos no check our settings, click on Academy and on learning print. Remove the basic learning model because this is the finished rent model re just saw under the academy mark the control box. That means we will control this agent. No, from an external a p i, and more about these components like Academy Brain and the agent. We will also discuss in the next videos. If you click plane all you should see an error message here in the concert that we have to set up our communicator. So let's do this. Okay? Euro timina opened the ML Agents Master Project and just write this statement. If you hit enter, then you should see the unity logo. That means the set up is correct so you can start the game in unity. You can see the agent is doing random actions. That looks really good because that means the agent is training. Probably after a short time, you can see the agent is moving one more in the direction off the large fear on your terminal. You can see no, the rewards, the reward off one its highest value the agent can achieve. You see, appropriates is really fast. No, we have a value off 0.9. This is a really good radio. Therefore we can, you know is the game. And we got an in for Emel agents trainers exported toe this fast. That means under this fast we can find no our trained model. So let's go to the model under animal agents Masters moderates first run! Let me delete the basic learning tool. This is just a test I did before What train model Mr Basic Learning File. No, just take the new file and pay stirs and, uh, models. Under the basic Academy game, Object removes the check mark. And under the basic Learning Crane, you can choose no o. R new train model, basic learning to click play and you can see the new trend model. And that's basically the ruffle who we can train our agents with. The federal agents took it, but more abort all off the components and what the animal agents took it here is doing exactly we will see in the next videos. 3. What is a Neural-Network?: okay, before we continue on creating over first I've from scratch. I want to give you a rough overview off whole unities and the Asians to Wicked is working what we already did in the last video and was actually to train a neural network. So but what is a neural network? First of all, let me give you a rough explanation off that, because then it's much, much easier to understand what unity is doing here for us. Behind the scenes, you can see here a typical diagram off a neural network. So basically what a neural network dust is, it takes one or multiple input. For example, the values office Enzo from a self driving car and processing them into one or more. Oh puts like the control for the car. You can see the neural network itself. I'm contains a lot off small units, the so called neurons. The neurons are grouped into thoroughly Laos. In this example in tow, one input player, two hidden layers and one output layer. The layers are connected to the next layer through rated connections, and these rated connections are simply real relative good numbers. For example, three point truth or 1.32 in your own takes note of Luo for connected neuron, for example 0.82 multiplies it breath their connections right? Then the neural network takes the sun off the connected rates and the radio off the neurons bias value and passed them into the so called activation function. This simply must domestically and transforms the value before it finally can be pissed onto the next neuron. This ray of the imports are propagated through the whole network, and that's pretty much oil on what the neural network does. So the real deer on behind neural networks is no to find the right rates in order to get the right results. That can be done through a wide range off AI techniques. What this course was focusing on his machine learning. Because this technique that unity jurors is trains the neural network. So did what we already saw in the last video and roof machine learning. We also have different methods. Unity writes us proof that we can train you are a ice is the basic scene is an example off reinforcement learning, and we can use, for example, imitation learning or euro evolution this cross is covering the message reinforcement, running and mutation only. But more about these topics we will see later for no. It's only important to understand that we have different methods off machine learning. Okay, that's everything for this video. And in the next one, we will discuss the key components off the immolations to look good. 4. ML-Agents Key Components: Well, no relative. Look on all the key components we need in order to train in a I ever of unities. Emily. Agents Kit. I have here a diagram from the official unity documentation. So first, we always have the learning environment in this case. And this, the unity seen rich contains all the as our components. We can have one or multiple agents. The agents are, for example, the characters we want to train, and they are attached to an immunity can object. The agent is making the operations, and the agent is also performing the actions and our basic scene. And you can find the agent under basic basic agent and each agent amiss. Linked to exactly one brain, the brain is the next key component. The brain holds the logic for making the decisions for the agent. So this component and received the operations off the agent and returns and action in practice. And we have three different categories off brains. You can see this venue create a new brain in unity. Right click, create Emel agents. So we have ah, playoff crane here. Human player is controlling the Agent Ruthie import from the keyboard. Then we have, um, Juristic Crane. The decisions are made here by using hard coded behavior, for example, roof if else statements. So this praying doesn't use a neural network in the vacant. And finally we have the learning brain. This brain we will use the most in this cross. And this uses a neural network toe. Determine the actions for each agent. And next, the prayer is linked to the academy game object and the academy. We set up all the brains we want to juice in the learning environment. We also have been here the external and communicator toe the fighting a p i. Soto tens of floor to train our actual neural network. And these are basically all the key components we can find in any project that uses communities, family agents, kids. And in the next video, we'll start creating our own on first reinforcement Learning Agent 5. 3D - RollerBall Project Overview: before we start creating a law. First ai from scratch. I want to show you the finish reinforcement Learning project. So we will create this Rollerball scene. The moving feel to see agent and the goal off the agent is to find the shortest ray to the position off the target Cube. If the agent and can find a position off the Cube, then we will reset everything. We will remove the hello kitty off the agent and reveal spawns a cube on a random position on the plane. And then the agent will start finding on the new position off the cube again. And that's basically everything. So this is a really simple project in order to make your family lot of reinforcement learning and the M l agents to work it. 6. RollerBall Set up: Okay, now let's create the Rollerball agent first, create a new game seen on the file New scene, Safe s Rolla boy. And let's set up the and Raymond for the agent. We need a new plane maker. Right click in the hierarchy par three D object plane. Name this floor and the measuring rod. Let's change the material. In order to make this more clear, click the small circle and truth. For example, this one. We also need a new cube, Right Click three d Object Cube. This is the target changed the defaulted and position toe 30.5 and three. The change of the material to block. And it's great Z agent right click through Do you object? Fear. Name it Roller agent and change the white value for the material. I want to use the Shekhar scrap. Let's also get some physics for the aged by ending the rigid body component. Next, we need the economy game object. So let's create an empty game object and named this academy. So every Emily agent scene needs one academy instance on the cross. The academy class is obstruct. Therefore, let's do this by creating a new C shop script. First, let's create a new folder, Scripts and fold up Rains on the scripts, Make Right Click Great See Shop script and named the script Rolla Academy. Set up for the class is quite simple. We just need the name Space Emel agent, and we need to extend the role Academy Class Roof Academy. No, you should be able to see on these near feuds Rare can at, for example, the prince. So let's create these prints first. We need the learning brain. This brain holds the logic for making the decisions for the agent, and we need a play up rain. This is just for testing the agent in order to shake if all imports are working correctly. Named the Playoff Rain Roller Ball playoff. Okay, let's continue on the roof. The script rift that we set up all the actions and operations for the agent named this roller agent. First, we need toe interviewed or new class roof agent. Then you should be able to see no Auntie Fields under the Inspector. Next, we need access to the physics off our agent, so we will set a reference to the rigid body in the start message then we need the reference to the position off the target Cube roof, transform target and just re control the cube on the script. No, we can create the first key message. And we need for every Emel agent seen the so called collect operation, function and dysfunction and we will set up all the imports and we need to consider Remember, here thean ports off a neural network basically, and this exact the same. So we need to provide. Dismiss it. All the correct informations the first import and we will provide is the position off the target roof at Victor Ops Target door position. Then we need the position off the agent itself. The this transform dot position and Savella kitty for the X and that value in order to control the speed so that the agent and doesn't overshoot the target and roll off the platform. The next key mess it and we have in every Emel agency is the agent action function. The decisions off the brain comes here. Inform off an action area Victor action. The number off elements in this area, Mr Termed by, is a victor action space type in space size settings. off the agent sprain, um, evil cedars in just a few moments. So first let me paste. And here the coat. Then I will explain this. And let's sit up here the speed rainbow Public float speed and set us to 10. Okay, so first we create a three dimensional victor. The effect was free of about zero. We set the X Y and Z value to zero. And the next line resets the X ray you off the new rector onto the first element off the victor Action area and then reset Zet value to the second element off the victor Action area. Richard Body, ATF lost, readjust moving our sphere agent, depending on the values on we got from the victor action area. Okay, Now we need toe set up these values in the settings off the agents brain so that we can move the X and that value off the CEO. Open the roll a ball player. First, we need to set up the space size and to eight on because we have eight operations here. The space type will be continuous for a smooth movement and space size off too, because the Victor Action Area needs to elements and no. And for testing these values with the player on, we can set up here the continuous player actions. We need a size before because we have four inputs and rough. W a s D first, let's start roof D Um, the index zero. And so value is one. Then we need a The index is also zero, and radio is minus one. Then we have w we need here. The second index off one value is also one. And at last, we need s index one and radio is minus one. Okay, now we can test the input, but first, let me move the camera for a better overview. I trust it will change here. The value and the rotation and no, you can see we are able to move the agent riffs keyboard. So this looks got. Therefore, let's continue. No, we will create the agent reset message as the name already safe. This for resetting a while. Agent. Let me also on pace your some Colton, This is really straightforward. As soon the right position off your game object is under zero. That means our agent isn't long on the platform. Then we will set our agent on back to the vector zero 0.5 and zero and we also remove all the hello kitties off the agent, and it last reset a Target Cube rift this land off court to a new rendering position on the platform. Finally, let's create the reward functionality. The reward system is the key element off reinforcement running, and basically this is really simple. So first we will calculate the distance between the agent and the target. If the distance to the target smaller than 1.40 to float that means and we have reached the target, then we will reward our agent worth the value one in general. We always should rewards the agent and refer maximum value off one if the agent and was able to complete the assigned task. And we can punish the agent of negative values to minus one. If the agent does undesirable actions in this case, and we could punish our agent, for example, roof minus one if the agent fourth off the platform. But in the unity documentation, they clearly point out that positive reward values are already working better to train the neural network and therefore you shouldn't use negative values and too much. Then we're calling the done message. That simply means we will execute the agent reset message and in the case of our agent filth off the platform, then we will also call the town message. Okay, that's everything for or set up. So let's testes. Okay, If we can reach the target, then we will spawn the target on the new random position. 7. Training the AI: this video. Set up the neural network for Agent End of Evil. Train the brain for first and please go under Academy and eight years of learning and the player Brain and important Machias Control box. Because we want to train the brain from outside off unity and as a roller agent script at Learning Crane and increase the decision. Inter vier to 10 on this frost beat up the process. Okay, now I'm opens a project folder and here under conflict on, we have the trainer conflict file. This is the file and that is responsible for creating the neural network with all of the settings I don't want to go on to deep into the settings. Instead, I want to point you to the documentation. Under animal agents, docks, train Emily agents, he'll confined. Read written descriptions for all the settings. So if you don't change anything in this file, and then we will always use the default settings for training in the brain, and the settings should be fined for the most cases. So if you want to train your own brain and then you can always start with these values for simple agent, I want toe modify the learning print. You can do this by writing here the name off Learning brain. I'm followed by the customized settings. In short, with the settings here, I just increased rate. And to train the brain, I made a small mistake. Um, please. Shoes under the academy on the correct prints. This arm off course, the rollup rain and Mullah ballplayer. And under the roller agent, we need the wall. Are bull brain No. Let's trainer print opened a Project Fuller Mission Learning Emel Agents Master And right here on this comment, Emily agents alone. I'm followed by the directory off the trainer about conflict Fire and drift The idea First , run the idea you can name however you want and followed by dish Dish train. Let's click Player. Okay, this looks good. And the agent trying to find the fastest ray and toe the target. Another helpful tool I want to show you is following open another terminal and go to the Project folder and just write the following comment. Tens onboard dish dish looked here summaries because no, If you open your prose on under local host 6006 you can see em this nice dashboard from Tensorflow so we can on display here rewards visually this. Ah, nice to and to see how ready on process off the agent is. So no, let's train the agent for just a few moments. Okay? I'm you can see we have. Ah, really nice progress is everyone where you is really close to one. Remember? One is the maximum reward the agent can shift. Therefore, let's interrupt the training off. Go to the project folder under models. 1st 1 you can see No, a new trend. Modern Moved the modern in tow. The Unity Essence folder Onda Academy Remove the control, shake mark and under the learning crane and choose the new trained model. Let's start the scene again and you can see the agent is doing. I'm really 8. Flappy Bird Project Overview: in this lecture, we work rates the next reinforcement learning project. More precisely, we will create the floppy brought a I So first, let's have a look on what we will actually do in this lecture at the beginning. And we only have a flappy bird clone made for if unity and the goal is it to create a perfect AI that can achieve an unlimited score in Flappy Bird. So we've modified the scene number. If the Emily agents took it and reinforcement learning, we will set up here all the components you should already know. The academy, the brains, the agent script, etcetera Especially here is we will train on several brothers and Raymond's simultaneously . This is just for speed up the process because then and we can train the agent and rift twitted much, much faster. And after we finished the training, then we should have a pretty much perfect AI that is able to achieve an unlimited score in Flappy Bird 9. Explanation of the basic Scene: OK, now let's get started. Rift! Flappy Bird Clone. So first I upload it. You this folder, please don't notice folder and importance into the community s its border. Open the flappy bird scene and check if you have here the settings for 11 20. Otherwise, I'm just click. Here's a plus and at this resolution and Schuster's as you can see, we have here a complete flappy bird clone toe. Push the bird and you can click the left most button or the space bar. If we are able to pass the pipes, then the counter and will increase plus one read No, let's have a look on the components, the structure off the game. It's really straightforward under the canvas, and you can see a text for the counter under then Raymond and we have a sprite for the background. End Sprite and forth groaned of a smart script for the movement by default and removes the button back around to the left and after us three seconds reset the position was the initial position. This free? Just the number, right? Right, And so when the button back groaned and what left the camera view, then you can find a game Option eight end zones. And here we have colliders for the top and button. So if we hit one off these colliders, we will restart the game. Next we have the pipes. Boehner came object on that. This came object. We will spawn all the pipes. It's the beginning. We were called a message. Spawn pipe and dismiss. It was just instant e eight, the pipes on a random white position and the pie perceive is a pre flop. It has three colliders to collide us for button and top pipe and the collider rift take goal. So if all flappy bird on his coalition recalled, we will increase the score counter special. In this poor messages, we will set all the pipes is a shire them off the pipes. Boerner came option because if you want to reset all the pipes, we just can group through all the pipes and we can destroy them. Here. The call this message from the Flappy Bird script. Um, we were teenagers in just a few moments and in the update message, and we have a small counter. So every three seconds we call this born pipes message. I'm in order to instantly eight a new pile. Now let's have a look on the birth itself. Re if you rigid body for credit to You and Circle two D Collider for detecting collisions and Flappy Bird script. The Flatbread script. It's also really straightforward. And to stop message, reset the referenced with a rigid body, and we saved the stopped position. If we hit the space bar or the left most button, we will execute the flat message in the flip message. Very simply, it's a rigid body and importance in tow. The upward direction. If the broad has a coalition rough to take gold, this disc ALeiter from the pipe, then people increase the score plus one. Otherwise, if the broad has a collision with any other collider on the scene, we will set a pavilion. Is that true? And we would call a message reset all. So with this message, we will restart the game really removed. And here the value. Katie, we set the position off the bird and two, the initial position. The step William back to Fort recalls the research pipes message, and we set the score Big toe zero, and that's basically everything for the Flappy Bird clone and in the next video even modify everything in order to create your own a i that learns to play flatboat. 10. Set Up of the ML Agents Components: Hi, everybody. And welcome to this video this crash courses about machine learning and artificial intelligence with unity three D. Before I'm going to show you the main AI projects, we will create your from scratch. I want to discuss why unity is the perfect environment to train your own ice. Well, let me take the case off a self driving car. What you need is complex and Raymond's where there are lots off realistic physical interactions. So you could Paride these data's from interactions with the real road. But this is extreme, inefficient and time consuming, since games become more and more realistic. You just can't write these informations from Brittle and Romans. And for that, unity is perfectly positioned. No matter if you are a game developer who want to create a eyes and four games or if you are hobby researcher who just want to play with machine learning the ML agents to work it is too perfect. Start in order to create your own a ice. Okay, what do we created this cross Well. First we will train the basic seen from unity in order to shake all the settings and to get a first idea hold the right floor off the took it is we will learn what are the key components and for what they are responsible. What the to work it is doing behind the scenes by learning how a neural network is rocking . Then we'll dive deeper into the machine learning techniques, reinforcement, learning and imitation learning by creating a three D rollerball ai. The goal off the AI is it to find two facets Ray toe Random Cube. Another perfect example. Off machine learning is to create an AI that learns to play Flappy bird. Flappy Bird is a really nice example because there are so many informations and we need to write. For example, the height off the pipes position off the broad itself, etcetera, therefore, to calculate the best action in each state where if hard coded behavior would be really, really hard. But instead we can juice reinforcement learning and finally you relearn holds. Use a message imitation learning by creating a self driving car imitation. Learning I'm a special because instead to train the agent roof trial and error message, the neural network will be trained by values from a human playoff. If you're interested in these topics, then it would be really nice to see you again in the next videos 11. Training FlappyBird: okay, in this last video off this lecture are rebel trains the agent. So let's get started first, go under the academy Machias. A control box and shoes Unders abroad. The learning brain. Um, I already said in the intro I want to create your several environments for the agent because this will speed up the process on auto to train Z agent so we can duplicate our environments several times. The more environment we have, the faster people trains. Pray. I think this should be enough. Open your termina and the project for them and right to already known comment. The only thing I want to change under the trainer conflict file is following open the trainer conflict fire. I changed here the maximum steps toe 500,000. And that means the training and will stop after 500,000 steps. Okay, open the terminal again. It enter and click play. This will take no more time, Zinzi as examples. So I'm back in just a few moments. Okay? I trains the agent No, and four board. 10 minutes. You can see it and we have a clear progress. Remember, we haven't said a maximum reward value. Therefore the goal for the agent isn't know to achieve the highest reward value it can. So the longer you trains the agent, the higher will be the school. 12. Self-driving Car Overview: and this lecture will create an example off imitation running. But right, we need mutation learning well. Often it's easier to simply demonstrate the behavior we want an agent to perform. Resident attempting to rev it learn Environ trial and Iran message like we did growth reinforcement learning. So instead toe trains. The agent ripped the head off a reward function. We can give the agent imports from a human playoff. And exactly this Doing this lecture, we will create our own self driving car. So depending on Holy and drives a car, we will train the agent. 13. Explanation of the basic Scene: Okay. First I also applauded You afford a self driving car? Please don't Lord and Important folder into unity. Open the scene and let's keep playing. In order to see what we get here, we can control the car roof. W a s d If the car is a collision of two worlds, then I'm the cars responding on the same height off the position off the collision sold is basically everything so really simple. Set up the car despite ray from the standard assets from Unity Car control off and car audio And the fourth Casper it's from Unity. Then we have to the cath crypt I created in the update message Really pissing Xev videos from the keyboard. So over W a s d into the move function from the car controller script in order to move is a car in the great message and we have some references. For example, it was a conscript to the physic and special here to the real collide off. And this is important because with the real Kaleida and we can shake Anthee on collision enter function on rich plane or car is currently and depending on the position, I'm off the plane, we will respond. It's a car. So when the car has a coalition with the world outside or one inside? No. The goal is it to train the car so that the cars able to complete a full run off this road . But this we will do in the next few. 14. Set Up of the ML Agents Components: OK, now let's get started. Roof Modifying the scene on both the Emel agent took it. First, we need to extend or cost script rough sea agent In general, the settings we need here are saying I'm just like for reinforcement learning. So we also need to your Z operations actions, etcetera. The only difference here is how we trains a neural network. Therefore, let's create no the operations we will do. No, I'm something different. Well, Brooke, with unities Ray cost assistant. So first, let me paste your in some floats and the following line of code click play. So our agent Real Observe Z and Raymond Ray cost the invisible distance off. The ray will be 20 in total and we will use five breakouts. The red cars will observe the following directions because no one we can create Ray cost hit. If a right hits a collider, then and we will save all the datas off this game object in this Rabel especially we can calculate and know the distance to the world. And we will transform the distance and to the worlds tour value between zero and one by dividing the distance with the risible distance we will pass this value into the operations function in just a few moments in the unity documentation. They point out that the values froze the operations function and should be a normalized radios between zero and one. Therefore, we transform this values otherwise, if array and doesn't hit something and then we will return value off minus one, let me show you this in practice, Click player. If the agent doesn't it something, we will return minus one As arise. The agent was returning the distance to the wall number for value from 0 to 1. Okay, this looks good. Therefore, let's do the same for the other race. Just copy and paste following. Let's delete items a movement on because we will create no the action function. Suffer the movement we need following the Victor Actions Area and receives here to values from 0 to 1. And with this new values from the brain, we will control the car. So just like before next, we will creates the operations function. What we need here is following. If the right distance to wall is not minus one, that means an hour float holds value off the distance to the wall. So then we will pass in the distance. Additionally, toe that we pass in the value off one That means yes, we are receiving the distance to wall as a rice repressed in the value off one. So the maximum distance and zero for no, we don't receive the distance toe wall the same time it will do for all the other race. And it last, we will observe off all the hello Kitty values from the car. Finally, let's rename reset toe done and reset toe agent recent. And that's everything for the transcript. No, let's create the Academy game object right click, create empty numbers, for example, and Car Academy. Create a new script car academy and extend um, descriptive of academy at two new prints to the academy. I already created difference. You can find them under the Brains folder, so just reckon, Drop on the's brains on the academy and let's have a look on the set up. First, we have 14 operations, so two operations for each ray, um, a 10 plus four operations for the Ducatis. Then we have a victor actual size off to these radios are and for controlling the car and it last we have here for imports for W A S D. In order to control the car with the keyboard. Let's start the game. Okay. What? Imports are working? Therefore, in the next video, we can start and training the cough. 15. Training the Car: okay. And this video, we will start to train the car. So first we need to add the following component. Tow the car, the demonstration record off script because no. And we will record a demonstration for the agent. That means proof, descript we will create a fire and that we provide or the values that I needed to train the actual neural network. So let's get started under this field and you can give the demo file the name, for example. Training? No, just could play and trade some runs in order to provide enough values to train the brain. I will do. Here is less actions as possible because I just want to show you how you can set up everything. Okay, I sing for a demonstration. This is enough. After a short time, you should see no new folder demonstrations. And under this folder, you should see no new file. So depending on this file, legal training O R agent, But first, Goto car academy and change some values, especially the target rate on because I want to see what's the agent is doing during the training and no a marked the control box and off the learning brain and truth under the agent of the Learning Crane as well. Next, go to your project folder under conflict. The F years offline. Bc conflict fire This the file We need no to set up the crane. The only thing you need to customize this is ah, demo fast. Here, you need to write the name off the project. Emel agents andare asserts demonstrations and the new name off the demo file is training safe to file and open the project folder Refuel terminal. And no, I'm just write the following comment. Basically, you need to write the exact same as usually the only difference year. It's the name under conflict off line BC conflict hit Enter and click place. You can see the Agent Thomas during random actions, so you just need to train the agent No. Four vile. You can see the car is moving in the right direction. But I think for real agent, that is really useful. Re definitely need more data off training but is already mentioned. I only want to show you how the workflow is so you can sit up the environment for imitation learning. I sing the should be clear No. But if you want, you can train the agent, of course, and llama for better performance. And finally, if you are satisfied with the results and then you need to do the same steps and just like before and to seem. And now you can find a new ah trained model under moderates, self driving car student.