Google Earth Engine - Complete Course | Sam Van Emelen | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Google Earth Engine - Complete Course

teacher avatar Sam Van Emelen, Geospatial Data Analyst & Web Developer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

11 Lessons (1h 57m)
    • 1. Introduction

    • 2. Difference between server and client

    • 3. Filtering and displaying data

    • 4. Calculating with images

    • 5. Mapping a function over an image collection

    • 6. Iterate function

    • 7. Importing raster and vector data

    • 8. Exporting raster and vector data

    • 9. Introduction to image classification

    • 10. All classifiers explained Skillshare

    • 11. Clustering and explaining all clusterers

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Google Earth Engine is a very powerful tool for analyzing remotely sensed data. Quickly grab images from Landsat, sentinel, MODIS and many more public datasets with the Earth Engine and process them in the cloud. In this class I will try to show you what I have learned so you can get started quickly without the frustration of figuring out how everything works.

What will I learn

I really want to take you from zero, to completely understanding the ins and outs of the google earth engine. I will go over the way the tool works and the most common operations you will encounter.

Throughout the lessons you will be able to follow along and create a variety of projects, each using different aspects of the Google Earth Engine, ranging from performing global raster calculations, to all sorts of image classifications.

Can I follow this class?

This course can be followed by people who are already familiar with GIS and remote sensing, or students who are just getting started. All concepts that are addressed will be explained. If you have GIS experience, many concepts will be familiar to you, but this is no prerequisite.

Let's go!

My name is Sam, and if you like how this sounds, I will be your teacher for this course. I hope that this will give you all the knowledge you need for the projects you want to do, without having to spend days troubleshooting each different step of the process.

Meet Your Teacher

Teacher Profile Image

Sam Van Emelen

Geospatial Data Analyst & Web Developer


Hello, I'm Sam.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Introduction: Hi there. I hope you're doing great today and welcome to this online course about the Google Earth Engine. I have been using this tool for quite a while now. And when I got started, I was able to find decent written documentation. But two way I like to learn stuff is by seeing people do things and having them explain it. But there's not really much like that available about the Google Earth Engine. So if you're anything like me, you will find this course useful. I really want to take you from 0 to all the way up and running with the tool. And I'll show you how everything works and the most common operations you will encounter, right? Let's get started. First of all, the Google Earth Engine is a very useful tool when you are already working remotely sensed data or even when you're just getting started. It is a Cloud-based platform where Google makes their servers available and offers almost all publicly available satellite images. So you can do whatever it is that you want. So how does it work? You can write custom codes in JavaScript. These goats then get sent to the servers. And in a matter of seconds, these servers calculate your request and send you a response. This has a lot of advantages. You do not have to struggle with finding the right datasets and downloading everything you need, since it's all available online. And you can also run much larger computations with the Earth Engine that would not be possible or take hours on your own computer. I already said you would write what you want to do in JavaScript. And I expect you already have an ID on how to write JavaScript syntax. I won't go over it in detail. But if you have no experience with that, just pay close attention to the way things are written down in this course. If you want to start with this tool, you first need to sign up. It is fairly easy and there is a link on their website. You click there, fill in the required information and submit the request. They say it can take two to three days, but when I signed up, it was confirmed author a few hours. If you use the web application, you'll see something like this. In the middle, we have the text area where you can write a custom scripts. And on the bottom we have the map where your results are shown. In the left upper corner, there is a window with three more steps. The first step shows all the folders would save scripts. There you can also find example codes which might be useful if there's something like that you want to do. In the dock step, you will find all variable types and functions provided by the Earth Engine. If you click on them, you will find more information on how to use them. But this is also what we will cover during the course. Then we have the asset step. It is possible to save your results or upload data from other sources. And these can be found here, so you can use them in your scripts. On the right, there is another window we taps here, you'll find the inspector when viewing results. You can use this to get specific information about a pixel. The console is where output from your JavaScript will appear. For example, if I print some text, it will show here. The task step will contain tasks. You execute it. For example, when you export an image, it will show the task here. And by clicking that task, it will execute. Then finally, the profiler is something I don't use that often, but it shows what the server is currently doing to debug if there are any issues, a recent update, they removed this feature entirely. So now this doesn't even exist anymore. To search data you want to access, you can simply use the search bar on top. Typing keywords will automatically suggest datasets. And clicking on them will give you more information about the bands, the resolution metadata, and so on. And importing is as easy as clicking Import and you are ready to get started. I hope you now have an understanding of what the Earth Engine is all about. There are a lot of great applications and it's an easy way to get started with remote sensing. So go ahead to the next lesson to discover how you can use this powerful tool. 2. Difference between server and client: In this video, I will talk to you about a difference between the client and the server and how you can get the server to execute your code is very useful to know how these two interact. Since I didn't at first an hour, I think about it. A lot of errors I made could have easily been avoided. Since we use the API with JavaScript, all functionalities of this language can still be used like arrays, strings, dictionaries, and so on. But since we're also dealing with cloud computing, new functions and variables must come in place to connect the client, which is your computer, to the server. Whenever you use objects or functions that also work in other regular JavaScript environments, these will get stored and computed on your own device, the client. But when you want the Earth Engine servers to run code, you have to use Server objects and a server functions. These variables and functions are generally initialized with the double e. And these are the objects that will send the information to the server. So for example, I can create two numbers, a and b. And that can perform calculations with this, for example, a plus b. And when I print this out, we should be able to get the result. So as you see, the console has printed the number. But to really look at the Profiler, we can see nothing is there no action has been taken by the server. The entire calculation happened on the client side. When instead we defined a and b s server-side variables, we should be able to get the same command executed by the server. The server side variable for numbers can be declared to the function e dot number. Capitals are important here, so watch out for that. And as you see, the double E informs that this is a server site object. So it good try to run it like this, but it won't work. These variables are no longer numbers but objects and the plus operation does not recognize the objects. You can check the type of the variable, I'm printing it. And indeed we find that the variables or objects, or he wants to add these two variables. You'll have to call a server side function. Since these are server side variables. We can easily find all the available functions by using the doc step. There you find all server-side variable types. And by expanding them, you can find all possible functions. So we'll go down to E dot number. And these are all possible operations for this type of objects. In this example, we wanted to add two numbers. So the function at should do the trick. And we can try to run this again. And now we have a number. This time we got the proper result. But the HUC is not just an integer, it is also an object, e dot number to be precise. But the print command is one of the few that can recognize Earth Engine objects. And we'll give you a proper print. We're looking at the Profiler, we can verify the servers action again. And indeed, this time the server executed code. So the first thing that happened on a server is the algorithm numbers adopt. And the second item, the plumbing, represents the transfer of data from the server back to the client. It is the safest to use server-side variables as, as often as possible, since mixing the two can cause problems. For example, if you write this piece of code similar to the previous example, and we want to check if a equals 4, we will use another function. C should be set to false since the number 8 does not equal four. But then we check for C in a regular JavaScript if statement, we find that it is still set to true. This happens because the comparison does not see the variable c as an int value, but as an object. And it does not know what to make of that object. A print statement, on the other hand, or server-side function does know how to interpret it and run the function. To do the same operation a server site, IF function should have been used. This is an important thing to note when you get started with Google Earth Engine. I know it would have saved me a lot of time and frustration. So I hope I can spare you of this now. If you wanted to find more information about a third AND function, visit the complete docs online. I'll put a link to that in the description. Here you find a manual with more advanced explanations and examples for the more common tasks. And the reference guide contains all possible functions similar to the duck stamp. You can also find video tutorials here that are really long, often longer than an hour, and the quality is not always ideal. 3. Filtering and displaying data: One of the most fundamental parts of the Google Earth Engine is selecting the images you need and displaying these images or your own calculated images on the map. Google has already selected a wide variety of public datasets that you can easily use, and they are listed on their website. You can also browse them in the search bar on the top of your workspace. When you have found the dataset you want to use selected in the search bar and click Import. Now this image collection has been imported to your skirts. Image collection is also the variable name that the Earth Engine uses for image collections. So that's very convenient. And now we have created a shortcut so you can easily let the script know what data we want to access and we can use the name we chose here, L8, as a placeholder for the entire dataset in our script. We could try to print information about this object, but very often you will get an error like this. There are limitations on how much you can ask from the server. And apparently this is one. We can verify how many images there are in a collection with the function size. And we do that. We find there are almost 800 thousand images. So the entire Landsat 8 dataset is clearly too large to work with. We have to filter it somehow. To figure out what function we can use on this object type, we can go to the doc stop and search for image collection. But there are many possible functions, but probably only a few we will use regularly. To quickly filter the data you want to use their filter functions. There is a general filter function, a filter bounce, filter dates, and filter metadata. The first one is used to filter on specific filters. The Earth Engine as an object called filter, and you can create these and apply them using the first function. The filter bounce let you filter images that fall within a certain region or point. Filter date allows you to select images from a specific date range. And the filter metadata will make it possible to filter on specific characteristics. If you want to know on what properties who can filter, go to the information sheet about your datasets. There you can find the properties of the images and the datatype daring. But let's make this clearer as an example. Let's say I wanted to view all images within a region of North Africa. And not only do A1 death, but there cannot be more than 1% cloud cover. And I only want images from 2017. I would start with creating a region of my choice. You could choose to create a polygon from scratch by typing E dot geometry dot polygon, followed by the coordinates of your region. And you can also simply drawn on the map. And once this is drawn, we have created a new object. I will call this region. Now let's look back at the filter bounce function. As the docs tell us, we could have also created a filter object, the filter dot bounce, and then use that created filter in the filter function. But since it is such a common operation, it has its own function. The only other argument that is required is a geometry. This could have also been a point or a line. But in this example we are looking for a region. Okay, let's apply this. If you have type the function correctly, it should mark purple. And let's see how many images they remain in the collection. For that, we can use the function size again. And we see there are still over 3000 images in this collection. So let's continue with the example. We also wanted. Even just with less than 1% cloud cover. For that, we will use the filter metadata function. This function seems to require a name that is the name of the property you want to filter AND operator, which means less than, equal to and so on. Finally, a number. And again, we have for use our amount of images significantly, but let's go a little further and select only images from year 2017. The function requires a start date and an end date. These days can be read them in different ways, either as a date object, a string, or an integer. But I'll use these three for simplicity. And finally, we have to reduce our collection to well over 200 images. It is still a lot, but much more manageable compared to the initial 800 thousands. We did not have to create variables for each individual step as we did here. We could have also stack the filters on top of each other like this and make it a little more readable like so. We could then show the entire collection on the map. But there'll probably be a lot of overlaying pictures from different times of the year. So what I'll do is take the median of this collection and print the image that immediate function returns. The median function is easy to use and requires no additional arguments. And when we add this to the map, we get this result. You can adjust the represented layers and coloring here. And you can also save it. So you won't have to adjust this every time. When you want to reuse earlier parameters. You can add them to your map.get layer function. And you can also give your layer and name. And that's it in six lines of code, we have been able to filter an immense dataset and display the result. 4. Calculating with images: Since you've seem to be interested in Cloud computing with remotely sensed data, you'll probably also want to perform calculations and other image manipulations. Luckily, the Earth Engine lets you do all of this easily. As always, the duck stamp is your friend. When you know what calculations you want to perform, you can go to a search for E dot image. And again, this shows you all possible functions you can execute with the image object, such as add a logic, end, calculating the cosine multiplications, and the list goes on. Most of the times using them is really simple and you can just type the name of your first image followed by the function. Hey, put the name of the second image between the brackets of that function. With combinations of these functions, you can do almost anything you like, but there is a pretty a solution now writing function after function. I'll get back to that in a while. But first, let me give you an example of this first method. I think calculating the NDVI is a great example for this. The NDVI, or normalized difference vegetation index, is an index that represents the amount of green in a certain place. Since chlorophyll absorbs a lot of visible red light while reflecting most of the near infrared light. It can be remotely sensed by looking at these beds. The Normalized Difference of two bands is calculated by subtracting the bands and dividing that with the sum of the two bands. It is the difference of the two and then normalized by dividing it by the max value, the sum. So the normalized difference. When doing this, we need to subtract the red band from the near infrared depth and divide that by the sum of the two bands. The Earth Engine has a specialized function for taking denormalize difference, but I'll calculate it the other way for this example. The first thing we need to do is selecting the images you want to use as a source and isolate the bands which we need. I will use images from landsat 8. You can select specific bands from an image by using the select function. You pass in the bends you want to select and to create a new image with the chosen bends. You can also choose multiple bands that we won't need that in this example. So I create a new image for the read depth and a new image for the near infrared band. And I store these in new variables. And then I can calculate the NDVI by first subtracting the red band from the near infrared band. And then dividing that by the sum of the two bands. The final image now represents the NDVI with values close to one. For land with a lot of green vegetation, values close to 0 when there is less vegetation and even close to negative one when water is present. We will save the parameters so we won't have to adjust this the next time. And when we compare these results to the satellite image, it seems in these correct that a high NDVI value indicates green vegetation. A second way to calculate this map would be with the expression function. This is a more structured manner of writing things and works like this. The first argument is the expression you want to execute. We wanted to calculate NDVI. So we will subtract the red band from the near infrared band and divide it by the sum of these two bends. Again, you can use any variable name you want, since the second argument will be a dictionary of the variable names for use with the corresponding data. So we used an IR and EDS variable names and made a dictionary with these two as keys. And then we run this, we get the same result. The method is equal to the previous one, but in my opinion, it's a lot more structured and easier to read. A few remarks though, reuse the expression function on our image. Between declaring the variables of the expression, we point to completely new objects. It looks like we didn't even use the initial image at all. And that's true. The dictionary we created, especially for when you use additional parameters. When you want to access bands from the image on which to use the function, you're going to access them with the B function. We can use this function in the expression itself. In between the brackets, you put the number of the Bantu you want to select. I have for entity available bands real quick and for calculating and VI, we want to use bends 4 and 5. So this is the third, fourth band, since they start counting at 0. And now we can remove the dictionary. You could have also specified the name of the band instead of the number. But then you would have to make sure that you don't comment out the band name with the backward slash. We didn't the expression. You can use every arithmetic operation such as addition, multiplication, taking the remainder exponentiation, and so on. And that's it. Those are the ways that you can perform calculations with images in the Earth Engine. 5. Mapping a function over an image collection: In the last video, we have seen how you can perform calculations with images. That these methods only work on single images. When you want to perform a certain calculation on an entire image collection, you have to create a JavaScript function and map that function over the collection. Let's first start with the function. Creating a function in JavaScript is done like this. A function that you want to map over a collection, mistake an image as variable, and return an image as results. This function will then be applied to all images in the collection. Between the brackets, you put a variable name you like. And now we can create the function. I will use the example of NDVI again, just as in the previous video. If you want to know how I got to these calculations, make sure to watch that again. I will select bends 4 and 5 from the image. These are the near infrared and red band. And in the expression function, I can refer to them as a, b, 0, and b1 respectively. Now it will calculate an image containing one band with the NDVI values. But the name of the band is by default set to the first band name used in the expression, in this case B5. We can give it a custom name with the select function. Not only does let you select specific bands, but you can also give them an older name in the new image. So we will just select bands B5 and use it as NDVI. This is done by first creating a list of the old band names. In this case, a list with only be five, followed by a list of the new band names, in this case, NDVI. This is not crucial that will make your results easier to read and analyze afterwards. And of course, the function has to return the image. After we have created the function, we can map it over the entire image collection. I already created an image collection here of seven images. The map function is used for image collection specifically and only requires the function as the argument. It will automatically use images as variables and you don't need to declare it between the brackets anymore. You should be attention here, since this map function is written without a capital. The map function with capital is the function use to display images on the map below. So now we have changed our collection of images into a collection of NDVI maps by mapping a function over an image collection. I'll images only have one band called NDVI. It is important to note that this is not iterate. If for example, you want to map a function over a collection, and in the meantime, appends the average NDVI to a list. This won't work. Because of the way the Earth Engine works. It does not actually go through the images one by one, but rather divides the work over multiple servers at the same that if you do want the Earth Engine to go over the images one by one, you are better off with an iteration function. So that's how you map a function over an entire image collection with the Earth Engine. If you have any questions about this or another topic, let me know. 6. Iterate function: We have already seen how we can calculate with images and how you can apply a certain calculation to an entire collection of images. But sometimes you need to apply a function on a collection in a specific order. And where the output of the previous calculation is important for the next four that the script will need to go over the images one-by-one, which is done using the iteration function. Let's say, for example, that you want to assess this awareness of droughts. And we want to calculate what the maximum amount of consecutive dry days was in a given year. We can do this by getting a dataset with information about precipitation. Iterating over those images were no rain has fallen. We can add a one to that region, but don't any amount of rain has fallen, we can reset the value of the region to 0. By doing this, we will get images that contain the number of consecutive dry days. But let's make this more concrete and polluting into coat. For the dataset I want to use, I will go with the version dataset. It contains global precipitation data from 1983 until the present, and uses artificial neural networks to estimate the daily precipitation based on other satellite data. I will select all images from the year 2017 and begin by writing the iteration function. It works very similar to the map function that are used in the previous example. We're looking at the dogs. We find that it requires an algorithm, meaning a function, it has to be executed. And then the initial state, meaning the initial object to start the iteration with. You can also iterate through feature collections and lists. And these work in exactly the same way. The function used in iteration has a few requirements. It must take two arguments. One being the current element to use in the iteration and the second being the result of the previous iteration. If this isn't clear yet, it will be once we start writing it function. It is also important to note that this returns a computed object, not an image. If you want to use it as an image, you only have to declare it as one since we already know it actually is one. So in iteration function, we already typed the name of the function that will create later. We wanted to add a 12. Sullivan has not rained and reset the value to 0 when it has rained. For this, you'll have to create an initial image containing all cells with a value 0. We can do this by creating an image containing all constant values of 0. You can quickly rename the band with the function rename. And now we have created the initial image with one band, which you can write in the iteration function. Also, do we need to cost the variable type too long? So it is of the same type as the other images. Since you have set all values to 0, it'll automatically think it is binary. The function has to take an argument for the current image handled by the iteration, followed by an argument for the result of the previous iteration. So what we then want to do is check in the current image if demand of precipitation is 0 or not. We can do this by remapping the image. You are sick the docs real quick. And the arguments that are needed are the values we want to change, the values you want to change them to. And then potentially a default value and the band name of the band you want to remap. What we will do is remove all zeros to ones and everything else to 0. We can then add this remapped image to the previous results. In that way, we add one to all cells that remain dry. And when we multiply by the mass again, we reset our cells to 0 when rain has fallen. For the first iteration, there is no previous result, so the initial image will be used. That is why we create an image containing only zeros. And then everything is done during his new results so it can be used in the next iteration. That's all you need to do for the iteration function. The exercise with the maximum amount of dry days is not yet complete. That Let's see this result first. The iteration and rent through all images and this was the last result. But this also means that a cell could have endured a 100 consecutive dry days. But if it rained on the last day, the counter would have been reset. Somehow we need to get a list of steps taken by the iteration and then take the maximum value a cell ever had. This is just a minor change in the goat. Instead of returning the updated results to the next iteration, we're going to append the result to a list of images and let the next iteration, the last image from that list to work with you. We'll start by making the initial object a list instead of an image by simply putting the image in an E dot list. And in the function, we need to change it so that we apply the mask on the last image of that list. Getting item 0 would be the first item. So get negative for one will select the last item from the list. And finally, we need to add this image to the original list and pass it on to the next iteration. Now our results will no longer be in E dot image with an e dot lists with all steps of the iteration. A final step will be to create an image collection out of this list so we can easily select our maximum values. Luckily, you can create an image collection with only a list of images. So we can add this as the argument. And that last, we will take the max values of the images in the collection. We are printed on the map. We finally get the most consecutive dry days a certain cell has endured during the year 2017. But now we have to take into account that the iteration does not return an image but a list. So we have to define it as a list in your result. And that's it. I hope most of it was clear and you now understand how the iteration function works. The exercise went a little further than only explaining iterations. So I hope that didn't confuse anyone. If you still have questions about iterations or this example, make sure to let me know and I'll try to clear things out. 7. Importing raster and vector data: Until now, we've always used datasets offered by the Earth Engine in our examples, but it is also possible to upload your own assets and use them as well. So in this example, we will see how to do just that both for raster and vector data. I assume you already have external data. You want to import either data you create yourself using some GIS software or data from other sources. If not, there are many platforms offering free satellite data, such as Earth data by nasa or Earth Explorer by USGS. But do a quick search and you'll probably find many more. When you want to upload a raster to your assets in the Earth Engine, it has to be of a Geo tiff format. This is basically a tiff image with integrated to your referencing information. You can also upload the files together with a DFW file that contains dereferencing data. If you're arrested or is not of these formats, try exporting or downloading it again into that format. But if that's not an option, you can always try converting tools. There is a free online converting tool that works great, but there are limits on the size of the files you want to convert. I will add a link to this tool in the description. Once you have your TIF file, you want to upload, go to the asset step and click on new. For uploading, arrest or data. You choose Image Upload and you can upload the file. It is also possible to add an additional DFW file if available. This stores, as mentioned earlier, additional information about pixel size, rotation, world coordinates, and so on. If you want, you can add additional information about the image. And you confirm by pressing OK. You will see the upload this edit to your task step and the assets will be available once uploading is finished. I have five raster files with global population data that I want to store in a new image collection. So I will continue uploading the others. And let's just skip ahead until this is finished. So that's done. And now we can find our uploaded images under assets and start working with them. We could important directly in our script and start working with them as image objects. But since I have five, I want to organize them in an image collection. To do this, click new image collection, and this will create an empty collection. Then we can open that collection and add images by typing the image ID, or simply by dragging the image into the collection. Now we can also import this collection and start working with it. Uploading vector data is very similar. Instead of image upload, you select Table upload and upload the files. These files has to be uploaded as shape files with the extension as h b and must be uploaded together with a DBF and MSH6 file. If available, you can also add these other files, but only the first three are required. You can also choose to upload a zip file containing these files. And again, if our data is not in this format, you can try a conversion tool. Then again you click Okay, and when uploading is complete, it will appear in the asset step break and use it as an Earth Engine feature collection. I have here a shapefile containing all dry regions on Earth. And now again perform calculations with these datasets as with any other datasets. For example, we can calculate how many people on earth live in a dry region for the given year. And that's how you import external data. 8. Exporting raster and vector data: We have already covered quite some functionalities within the Earth Engine. And last time we saw how to import external data. This time, you will explore the functions that let you use data obtain, treating the Earth Engine outside of the tool. Then you can make proper maps of the data or even continue performing calculations with other software you like. So the Earth Engine has for export possibilities. When you search in the docs for export, these will appear. The two major datatypes being raster data and vector data, can be exported to the function expert image and expert table respectively. And then two more functions exists. Export Map will put your data into a format that is suitable for web applications using the Google Maps API or Google Alerts online. I won't go into detail about this as it is very specific and eventually doesn't let you use your data on your computer. The fourth function is export video, and this converts an image collection to a video. This can be useful when you want to visualize changes over time. For example, for each exporting possibility, there are three more options. You can export the objects to your assets. We have used this in the previous video as well, and this stores the image online so you can use it in other scripts. A second option is to export your objects to Cloud Storage. This is a paid service from Google. And it's a Professional cloud storage that you can access using an API, which makes it great for using it in custom applications. The last option is to export the data to your Google Drive where you can download it to your device. All of these functions take a large amount of variables. Most of these are optional, but I will go over them one by one so they become clear to you. We'll start by exporting a single image to our assets. And we look at the docs to see what the required parameters are. We can see that actually only the image is required and everything else is optional. There are default values for all other parameters. Let's see what happens when we leave everything as default. After that, I will go over the other variables to see what other options you have. Executing a function that will create a new task. And when you run the task, you will get asked some of the commonly used parameters again, but there are already filled in and I will leave them as they are. So when the task has executed, we can add the assets to the script again to compare the results. The first thing that you might see is the loss of resolution. The default option for resolution is 100 meters per pixel, but lumped soft comes at 30 meters per pixel. So that alone is already a reason to look into the parameters. So I'll do this export again, but this time I'll go over all possible variables. When you add a lot of variables into a function, you can list all of them between the brackets, but discretely gets unclear. A better solution is to list it in a dictionary. When you do this, it also does not matter in what order you put the parameters. So I'll go back to the docs real quick to copy the possible variables. At first we have the description. This is in contrast to what it sounds like, actually the name of your assets. You should make sure it does not contain any whitespaces. Next is the SSID, and this is the ID of your file and it will be stored under this name. Then the pyramid in policy will determine how pixels are generalized. When you zoom out and zooming out, the software will have to group pixels together as you can't display each single pixel forever. This hierarchy of displaying and grouping pixels can be compared to a pyramid structure. Hence the name. It speeds up the loading of images because you don't have to download and display each pixel when you're only looking at a larger image. The default permitting policy is to take the mean of all cells when grouping them. This should be sufficient unless you have a specific goal in mind that requires something else, then you can decide to route some pixels by the minimum value, the maximum value, a most occurring value, or just steak assemble. If you specify this, you will have to list all the bands you want to apply it to in a dictionary. Or we can change the default policy like so. With the dimension, you can specify the dimension of the exported image. This does not change the size of the image, but rather how many pixels the image will contain. I'm not really sure how this works and what the difference is with choosing a scale. And I can't find more information online, but I haven't had the need to use it either. If anyone knows when it does specifically, let me know. You can also specify a geometry. Then the bounding box of that geometry will be the extent of the exported image. By default, this is set to the view you are currently in, which is very confusing. If, for example, you have zoomed in on a part of your image while exporting only depart again, C will be exported if you leave this variable as default. I have already created a small region to use. If you wish, you can define the scale of the image, which is the same as the resolution. This is, as mentioned already, set to 100 meters per pixels initially. But since we know that Landsat imagery comes at 30 meters per pixel, we can choose this for the best image quality. It is also possible to change the coordinate reference system as well. By default, it will be the same as the image you start from. And you can change this to whatever you like. In this example, I will take a reference system specifically for Uganda. You can also change the transformation at the Earth Engine and should use for changing the reference system. But this is not required. If you want to look up the correct codes you need to enter for reference system, I'll put a link to that in the description. And the last variable you can change is the maximum amount of pixels in the export. Initially it is set to 100 million or 10 to the 8. You'll get an error. If you exceed this value, you can set this value higher or lower in case you want to export a larger image or restrict yourself from downloading too much. You have to make sure you type the parameters correctly. And when we run the task, we can compare the result again. These are all the possible arguments you can define when exporting an image to your assets. Many of the following function use the same arguments, so I will be able to go over them much faster. I will skip the option of exporting data to Cloud Storage. This is similar to Google Drive between extensive API scalability to exabytes of data. This makes it actually better suited for applications rather than personal use. I've never used this, but a possible parameters are similar to the ones mentioned already and additional parameters seeing well explained as well. Then a dirt exporting possibility is to export to your Google Drive. This is what you'll be using when you want to download data to your computer. Or you want to easily share the files with others. Again, only the image is a required argument and everything else is optional. The only parameters that are different from the export to assets functions are the folder, a filename prefix, chart size, file dimensions, the option to skip empty tiles and info on the file formats. First of all, the folder, let's say I have a folder in my Google Drive named GE data. Then I can type this here and my results will be stored in that folder. If the specified folder does not exist, it will automatically create a new one with that name. The file name prefix is a filename that will be used for the export. By default, this is set to what you have typed in the description. The short size is some internal measure, and eventually it does not affect your export. It determines the amount of pixels that the Earth Engine little process at a time. The amount of pixels does not have to form a square. It's going to be any number. It is only important regarding the next parameter, the file dimension. If you want to put an upper limit to the amount of pixels one image can contain. You can define it here. The image will then be split into multiple images that eventually make up the exported image. But here it is important that the amount of pixels in a single file is a multiple of the short sides. If, for example, Google Earth Engine handles 100 pixels at a time, as could be defined to D sharp size. You cannot ask for an image containing 110 pixels, but you could create a file, it's 200 pixels, which consists of two shots. That's why this has to be a multiple of the chart size. These different images are then names after the file name prefix that you possibly define, followed by the coordinates of that style. If you mark skip empty tiles as true, it will not write data from tiles that are fully masked. So tiles that are not regarded in your export won't be computed. There are two supported file formats. The first one being Geo tiff or TF records. The records is only used as input for TensorFlow, a machine-learning tool. So Geo tiff is what you most likely will be using for exporting. This is also the default setting. You can enter specific parameters for the format, but this is mostly for exporting tf records. If you are exporting such files, you might want to check the docs for the possible options. And let's everything you can know about exporting rosters in Google Earth Engine. If you get an error like this, when the datatype seems to be incompatible, you need to cast the data type of the image to a specific format. And it usually works is just to cost the image as a float. And now we try it again. And now the image appears in our folder. Exporting vectors is even easier. When you export it to your assets. You can only define the collection to export a description and an ID. All of these we have discussed already. And even when exporting to your drive, no additional parameters are required except for the drive specific fields that are also mentioned for restaurants. Exporting vectors is possible in a wider variety of formats. You can choose between a CSV file, a GeoJSON, KML, KMZ, a safe file, or again, a TF records. Also regular tables with data can be exported like this. Then the items we'll just have no geometry. We do not have to export all columns of a table or vector. You can select the ones of interests would be select our parameter. Then the last item I will cover is the option to export image collections to a video. This can be useful for when you have extracted a time series and conveniently want to display changes. As an example, I will simply display a time series of Landsat images. This is again similar to exporting an image to your Drive. There are two new variables. For this function is important that your image collection only has images. 33 bands are red, blue, and green band. If this is not the case, you need to select only those three bands. You also need to use a certain variable types. If this is not the case, you will probably get an error and you will have to cast it as an eight bit integer, for example. So a first new variable is the frame rate at which the video will be exported. So this is how many seconds and image will be shown in the video. And secondly, you can set an upper limit of the amount of frames to V0 can contain. By default, your video can contain 10000 frames. You can set this lower or higher as you like. So that through all possible ways, you can export images, vectors, tables, or image collections. If something was unclear, let me know. I'd be happy to help. 9. Introduction to image classification: Another powerful tool of the Google Earth Engine is the capability to classify images can be useful for a variety of cases. For example, when you want to create a map of a certain area rather than an image. Or where you want to assess what land cover changes took place over a certain time period. In the Earth Engine, there are two ways to classify pixels of an image. You could do a supervised classification or an unsupervised classification. Would a supervised classification you have a sample of training data. These are boys with spectral values of which you know what type of lens covered they are. For example, a table like this where I have a set of points with spectral values and for each point and no water lens covaries. With this training data, a classification algorithm can determine the spectral characteristics for a certain lens cover. And then for each pixel in the image, it will determine what the most likely landcover is by comparing it with the spectral characteristics of the different land covers. The difference with unsupervised classification is whether you know what the length cover of the training data is. Arnold. With unsupervised classification, you don't know the land-use of the training data. And the algorithm tries to make a certain amount of distinct groups based on the spectral values. Of course, there are a lot of different algorithms to perform these tasks. But in this video, I will only go over the principles of image classification, training classifiers, and validation of the results and so on. I will make another video where I'll go into depth in the different algorithms that are available. When that is finished, I will put a link on the screen or in the comments. If there's no linkage areas at the time you are watching, you could subscribe to the channel or hit the bell icon to get notified once it is uploaded. Okay, first of all, let's find an image to perform a supervised classification. Now, this looks like a nice region. I will use images from landsat 8, but you could use any image source July. I will filter the image collection based on this region. Take only images from the last three years. And we remove the images who die Cloud or it is perfectly possible to run on image classification on entire collection. But as you can see, there can be differences in the reflectance values of different images. Because images are taken over a different time periods, atmospheric conditions might change the spectral values. To reduce this effect, you could, for example, take the mean value of all images in the collection. This already strongly reduces this effect. Now we have an image to classify. The next step is to create training data. So as mentioned already, the training data is simply a table with spectrum of data and the known land cover type. You could import this table from another source or just created in the Earth Engine itself. This is what I will show you. When I look at the image, I see four common land cover types. You can clear, see water, bare soil, vegetation as some built-up areas. I will try to classify these to get the training data I will create for individual feature collections, which I can merge later on into one list of training samples. I select the option feature collection. And it's an attribute LC 4 lens cover with a value 0. This Ciro will represent all water pixels. These values have to be numeric photo classification to work. When assembling, it is important to spread your assembles across the area and take as much of the variation within a lens current loss. When working with low-resolution images, also tried to select pure pixels. So pixels that for 100% consists of those a landcover, you are assembling 20 pixels, that should be enough for this example. This will not be enough for an accurate result, but it will still show you the patterns. And I do exactly the same for the other three clauses. Okay, so now we have our training points for the different land cover classes. And to group these into one training data feature gloss, I simply merge the four feature clauses together. And when we print these out, we indeed see 80 points with an attribute indicating their land cover. But for the classification algorithm, we did not need the points and their lens cover. We needed a table of the land covers and their spectral values. For this, the Earth Engine has a special tool called simple region. When this is done, you get a featured collection with 0s feature and containing the spectral values and the land cover type of training points. After this step, it is possible that you end up with less features than the amount of points you started to it. In this case, we started with 80 points and still have 80 points. Good. If, for example, one training points did not have a value for one specific bent, the entire training point would be removed from the dataset. If love training data is missing because of this, you might want to figure out what bent causes this. I removed it from the selection. Now that you have our feature set, we get start and make the classifier. I create a new variable and make it the classifier object. I just big the first one and don't bother with all the parameters that, that's of classifiers and the possible parameters are a subject for another video. Now we have our classifier, but this one is not trained yet. For that we have to use the train function. The train function first requires features. These are our training samples we created already. Secondly, it needs to know which attributes stores the data on the lens cover type of the sample. Next are the input properties. These are the properties of the table that will be used to do the training. You can choose to train only on a limited set of attributes. If, for example, the training data contains information not irrelevant for classification. Often your feature class will have a property called bands order. And this already indicates which S reuse your training data has. By default, this list will be used as the list of attributes to train with. I will add this however, because the Landsat eight images have a bands dedicated to the quality of the image, but this is irrelevant for the training. And now that we have trained our classifier, it is time to perform this calculation on the entire map. When you use a classifier, it is important that your image has the same band names as used in the classification. So if your classification is trained on, for example, bands before V3 and V2, it to look for those bands in the image. If they are not present the classification when a lot of work. But since we will do the classification on the same image, we do good training samples from this won't be a problem. And then redisplay the classification on the map. The result is a single band image with each value being a land cover type. I had landcover is 0, 1, 2, and 3 for water, vegetation, soil and urban land cover. So these values on the map, Arctic classification. You can display them with a pellet of colors. If you search for an HTML color picker online, you can easily find colors you like. I already have a bell-shaped ready for this. We also want the minimum and maximum values to display from 0 to three. And that's it. You can see did great for most of them map, it only had some difficulties in distinguishing some bare soil types from urban length. And now I can use this classifier object to classify other images as well. But because training samples are based on this area, it will not continue to work well if you stray too far from this region. And to wrap this up, I would like to show one final function use to validate the results of your classification. Because if you can't validate the accuracy of your classification, how can you trust it results? The function to do this is the confusion matrix. This function returns a matrix with two rows of the matrix being pixels of the verification pixels, these IRL training samples. And the columns of the matrix are the classes in which these pixels are classified according to the classifier. In this case, because we did not have that many training samples, all points were also correctly classified and declassification. But when you start training on all of the data, which is what you should do, some of your training data might become classified as something else. Or for example, when you try to classify between different types of vegetation, the classification will have a harder time and get some pixels wrong. In those cases, the confusion matrix is a useful tool for determining which clauses are accurately classified and which are not, or what the overall accuracy of your classifications. So that was the first introduction on how to do supervised classification in the Google Earth Engine. 10. All classifiers explained Skillshare: We have seen how to do a basic classification. But as we also saw last time, there are many different algorithms to choose from when doing a classification. Each of these algorithms approach classification in a different way. And it's important to understand how these different algorithms work. So you can choose the algorithm that is best suited for your data. There are quite many algorithms to choose from, but a lot of those are not being continued and will be removed eventually. I won't go into detail too much about all the mathematics behind these algorithms. I just want to give you a general understanding about how these algorithms approach classification. So you can determine which one is best for you. So the first classification we see is the guard classifier. Guards stands for classification and regression tree. And it is a very popular algorithm for doing classification. So the way this classifier works is that it constructs a decision tree. So a decision tree is so sort of cascading system of asking questions eventually end up in the most probable class. This classifier creates such a decision tree. And each time a pixel is classified, it will answer the defined set of questions to determine which is the most likely class that big salicin. So for example, a tree structure can look like this, where the first question is whether the green value is larger than 0.5. And if this is the case, you're most likely having a pixel with vegetation. And when it's not the case, it can ask another question. For example, the red band. Is your red band larger than 0.5? So this is the yes branch, this is the no branch. And if this is not the case, so if the green band is Laura and zero-point 85 and the red band is larger than 0.5. You could, for example, have bare soil. These typical tropical red soils. Or when red is also less than 0.5, you could have, for example, water because that typically has low reflectance values. So in such a tree structure, if you want to classify a pixel to a start in the top of the tree, and we'll answer the questions to determine what the most likely lens coffers. How does a classifier create such a tree? Let's say we only have two bands, a green band and a red band. And we want to identify three clauses. If you want to classify trees, for example, our training data will mainly fall on this part on the graph with high values of a green band and relatively low values on the red band. If then, for example, you will also want to classify water. Water typically has low reflectance values for red and green. So training samples of water pixels will probably fits on this side of the graph. And then if you also classify buildings, they often have a wide variety of spectra arranges. It's very often they're not green. So these training samples might fit somewhere here on the graph. So what the guard classifier will do next is determined how to split this graph so that the resulting groups are as pure as possible. So in this case, a logical choice could be to split at this value of R. So for example, at a value of 0.4. And in the tree structure it would look like this. So if the red value is larger than 0.4, it is most likely that pixel we are classifying is our building. But if the value of the red band is less than 0.4, it is most likely at the pixel is actually vegetation. Because in this lower part, we have 12 pixels of water and I believe 14 pixels of vegetation and only one pixel of a building. Chances are most likely that a new pixel in this zone will be a vegetation pixel. So this would be a first step in the growing of the decision tree. Then the classifier will check if there is another splits it can make to improve the pureness of the different segments of the different zones. And it can, for example, find one. Here. It's zero-point tree. So now it will ask the question, Is red larger than 0.4? And if it is, we know it's a building. But if you read value is less than 0.4, we ask a second question. We will ask the question, is the value of the green band larger than 0.3? And if this is true, then most of the pixels in this zone, our vegetation. But if the value of the green band is less than 0.3, it is most likely at that pixel is a water pixel. So this is how a guard classifier creates a tree. If you would just let the tree grow, it will keep on creating new branches, especially if you have a lot of data and they're not well separated. For example, there could be another clause here to separate this green training sample. Or there could be another branch here to make a distinction between these two types. But to make sure that the tree does not become too complicated, we have to determine certain restrictions. Otherwise, we will just end up with final notes, each containing one training sample. The final nodes are also called leaf nodes. So there are a few ways to do that. You could, for example, put a lower limit on how many training sample a final note should contain. And if a split which results in a node with less than the threshold value, it won't split the nodes any further. Another way to keep the tree from becoming too complex is by putting a limit on how much a splits must contribute to the accuracy. So for example, with this last line, the edit benefits of separating this one single read straining data might not be worth the additional complexity of the tree. What is also often done is pruning of the tree after the tree is created. So when the cart classifier has finished a tree, the algorithm can look back at each node To see if it is really worth keeping. Statistically, you will get better trees. Were you just let the classification do its job? And then in the nth removed notes that do not contribute enough to the final result. This will give you better results than having strong restrictions when the trees growing in the first place. So this was mainly the classification parts of cards. But you can also use regressions in this type of classification. So for example, if you want to classify the density of forest based on the green band, you could have data that looks something like this. The principle is pretty much the same. Only after having classified the different nodes and branches of the tree. It will perform a regression in each leaf node. So if this would be the result of declassification, the final results might be regression like this. With this type of regression tree, your results will become a value like a percentage in this case, rather than classes like dense vegetation or sparse vegetation. So this is how the card classifier works. And when we look at the Google Earth Engine, what parameters we can put into it, refined a factor for pruning. We can also determine a maximum depth of the tree can grow to. So if we have a tree like this, this is a depth of one, a depth of two adept of tree. And if you want to limit the complexity of our tree, we can define how deep the tree can grow. Then the minimum leaf population is also something we talked about. We can put to imitation. That's a no, it should not split if the new elif would contain less than a certain amount of training data. Another way to approach this is by setting a minimum value a node must have before it can be split. But in this scenario, we can still have a node with one or two training samples. As long as the node above it has more than the minimum split population met the minimum split cost is also something we talked about. So if the splits would make a tree too complex, it won't perform the split. Then prune is just a Boolean value where the two prune the tree at the end of the classification or not. If you don't prune the tree afterwards, only the limitations you set on the growth of the tree will have an effect on the complexity. So the next classifier is the decision tree. So it's actually quite similar to this one. But here the input is a tree string. So if you have obtained a tree classification from other statistical software like R or any software you like. You can import this decision tree using this classifier. It only takes a tree string as an input. Then the next classifier is one based on maximum entropy. So maximum entropy works little bit different. And this classifier is especially useful when you have relatively little training data compared to the amount of variables you have. So these are mainly the cases where you do not have enough data to describe all the possible combinations of variables. So this classifier will try to determine the best probability distribution that satisfies a number of constraints. I'll come back to those constraints at it later. And of all possible distributions that could satisfy these constraints. The classifier will take the probability distribution with the highest amount of entropy possible. So I'll just make this clear with an example. Let's say we want to classify whether pixels are forests or not. And we will do it based on the green band. So when we have taken training data, this is a possible output. We would have very little observations where the value is actually one. So let's say that we have one value of one, value of 0.9. Let's say we have two vegetation pixels as training data where the value is 0.9. Let's take three pixels with value 0.8. We can have one with 0.7 to 0.6. And we don't have any vegetation pixels, red, a green band is less than 0.6. So intuitively, we expect the probability distribution to run something like this, something that looks like a normal function. But according to only our observations, a pixel with a green value of one, with a maximum green value would have a relatively low probability of being vegetation just based on the data we have gathered. So we want to determine a probability distribution that satisfies the constraints of our observations because our observations must fits into the chosen probability distribution. But we acknowledge that we don't know a lot. So we want a probability distribution that allows for the widest range of possible observations as long as it adheres to our constraints. So you want the highest amount of entropy possible. Because if we make a distribution to specific, we might put restrictions on what the observations can be like without being sure about it. So the mathematics behind calculating the best probability distribution is very complex for my standard. But I found a great lecture that explains this in detail. And you can find it in the description. So for example, if the only constraint is that the average has to be 0.8, then the maximum entropy, which result in a distribution that looks a bit like this. So this is a distribution with the highest entropy possible. While still adhering to the constraints of our observations being that the mean has to be 0.8. This would be a better approximation of the probability curve because a high value of green reflectance would actually be a high probability of vegetation. Alright? And the next classifier is the minimum distance classifier. This is a more simple algorithm to understand. So let's again make a simple example. We have an image with only two bands, green and red. And we have training data for vegetation. We have some water pixels, and some buildings. So what the minimum distance classifier does is it determines the mean of all the different classes. So the mean value of the water class will be somewhere here in the middle. This is the average reflectance value of water. The average reflectance value of vegetation will be somewhere here. And the average reflectance value for buildings will be somewhere here. So when the minimum distance classifier wants to classify a new pixel, for example, this pixel. It will look which mean reflectance value lies the closest to the reflectance value of the pixel. In this case, it lies the closest to water. So the pixel will be classified as water. Are, for example, when a pixel is classified here, it is not close to the mean reflectance value of vegetation is not close to the mean reflectance value of water. It is the closest student mean reflectance value of buildings, sort of pixel will be classified as a building. But it is important to note that this can create issues because it does not take into account the shape of the distribution of the different classes. So for example, in a scenario like this, or the pixel lies here, we see that the water pixels are very close to each other. And a pixel very far away is actually unlikely to be part of this group of water pixels. While the pixels of vegetation have a much larger variety. And intuitively you might classify this pixel as vegetation, but according to the minimum distance, it actually lies the closest to water. So in this scenario, you would classify the pixel as water. So the next classification is a Naive Bayes. And this is more a statistical approach of classifying pixels. So when you have a training set like this small one, for example, it will work as follows. So the classifier will check what are the odds of a pixel having a red value of 0.1 and being of the class water. Or for, or another example of what are the odds of the red value being 0.1 in the class being vegetation. So to compare the probability of all the variables for each being part of all the different classes. If you then have new pixel you want to classify the Naive Bayes classifier will check for each class, what is the chance of water having a reflectance value of 0.34? Red, zero-point seventy four green, 0.24 blue, and a height of 10. And then it will check for vegetation word lots of vegetation having this data according to the training data. And then whichever class has the highest probability of having these parameters, that will be the class assigned to the pixel. Next is the random forest classifier. So we have seen how you can create the tree with the classification and regression tree with the car classifier. But there are few disadvantages to using one tree. Classification and regression trees do indeed work well with their training data, but they're often still some inaccuracies when it has to classify data it has never seen before. So random forest is just a bunch of tree classifiers. But these three classifiers do not only splits on whatever is the best splits to divide the training data. It will also incorporate a factor or from numbness. So for example, when it makes a first split, there can be a constraint that the best split has to be found on two of the four variables, for example. And at the second split would have to be based on two different variables. So the result of this will be a whole bunch of random decision trees, each with a random component in it. And when the classifier then classifies a new pixel, it will be classified by all of the trees in the forest. And then the class that most of the trees predict is chosen as the class for the pixel. Then the next classifier is the spectral region classifier. It's a classifier that was developed for a particular user or Google Earth Engine. But they made it public for everyone to use. So it again looks at multidimensional feature space. So it looks at the virtual space which contains all your training data. I'll use the same example with a virtual space of two-dimensions, or red band and a green band. And a bunch of training pixels in the virtual space. What the spectral region classifier does is that it classifies your pixels in predefined spectral regions. So when you look at the input parameters of the classifier, you see that it takes a coordinate lists as a parameter. So this coordinate lists are features in this virtual space in which to classify the pixels. So if we define these regions in the coordinates, are the pixels in these regions will be classified according to the region in which they are located. So all of these pixels will be classified as buildings. These pixels, including the vegetation pixel and the building pixel, according to this, will be classified as water. And all of these pixels will be classified as vegetation. So the last classifier is the support vector machine. So the idea behind the support vector machine is that it tries to find a line in features. Let's separate the different classes through in a single dimension example like this is we will try and find a line, or in this case, a point that separates the two classes. So for example, this could be aligned to separate the glasses, but also this could be aligned to separate the glasses. But the support vector machine will try and to find the best line to split theta. And it will do this based on the distance of the closest observations. It will try to maximize this distance. So this line would have a very short distance with this point. Long distance with despite. This line, on the other hand, will have somewhat equal distances to boat observations. And the best line is the line that maximizes the distances to the closest observations. So in this case, this line. And we can also do this in a multi-dimensional space. So also in this example, we can find multiple lines to split these two causes. For example, this would be aligned to split two classes. There's also this would be aligned to split the two classes. So also here we have to look at the distances of the closest observations. So this line would have very short distances. While this one would have much larger distances. You can often see this represented as margins. So this line would have a very small margin. While this one would have much wider margins. And EBITDA margins are wide, the line is much better at separating the two costs. So these are the basics of support vector classifier. But what do we do if we have a situation like this? Read a data cannot be separated by a single straight line. So wherever we draw a line, you will never be able to split the two clauses perfecting. So here a simple support vector classifier will not be enough. Then we need a support vector machine. What a support vector machine tries to do is to create new dimensions based on the existing dimensions that would allow us to separate the two clauses. So what we could do in this case is create a new dimension based on the value of x squared. Then our data would look like this. And in this new graph, we can find the line that separates the two glasses. For example, this one. And now every time we have to classify a new pixel, we can square the x value to determine on which side of the line it ends up to determine which class to assign to the pixel. So d support vector machines will output a clause. So you determine the lines and the transformations to make based on the training data. And when you use the classifier it to compare each new pixel, 2D created transformations and separation lines. But it is also possible to perform a regression with the support vector machines. So a regression or look like this. If we have data that looks like this, a traditional linear regression would give you a result like this. But you clearly see that this does not represent the original distribution of the data. So also here, we would try to create new dimensions in which we wouldn't be able to and draw a line that represents the data. So for example here we can again create a new dimension based on x squared. And I'll draw this in New Craft. And then after the transformation that David's, I can look something like this. And in this new dimension, we wouldn't be able to perform a regression that better suits the data. And these are called support vector regression. And then every time we tried to classify a new pixel, we would square the x value. Then look at the regression. To determine which value to assign to this pixel. And then we get look at the arguments that the support vector machine classifier takes in the Google Earth Engine. And we get CO2 again, specify a support vector machine type. So we saw that the support vector machine can perform both classification and regression. Here again, choose between C-type support vector classification or an n type support vector classification. We can also choose for support vector regression of which we have two different types. Then the kernel is what determines what transformation to perform on the data. To look for linear splits are linear regressions. So the default type is a linear kernel. And this kernel will try to find a linear line to split the different glosses. This is what we saw in the first example. Then we can also choose a polynomial kernel. And this would perform operations like x squared, x cubed, square root of x. And the degree of the polynomial can be determined by this value. The value of degree. Then the RBF Kernel stands for radial basis function. And this kernel that introduces a weight based on the distance of the observations. So with this journal type D chosen, glass will be more determined by nearby training samples. And then again, choose to perform a sigmoid transformation of which we can specify one of the parameters as a variable. That was the last classifier you can choose from. There are a few more functions. So in the previous video, I showed you a simple classification in tree glosses that we have seen quite some algorithms that also can perform regression. So instead of just classifying pixels in groups like vegetation or dense vegetation or buildings. Some of the classification we discussed can also outputs a number, the output of a regression. To change which classification method you want to use, you have to use the function sets outputs mode. So you have three options here. By default, it is just a basic classification, as we saw in the previous video, which will result in a map that shows you the most likely classification of the pixel. If you choose for regression. The output will be a number, like the example we saw with the card classifier. You can, for example, classify the percentage of forest cover based on spectral values and this output will be a number. And then the last option is to classify probabilities. And then the result will be the probability of the class that pixel is classified as. And then the other functions like explained mode or schema are other ways to get information of the classifier you made. So I hope now you have somewhat of an understanding in how these classifiers work and what the principles are behind them. So that you can choose the classifier that best fits your data. If you have any questions about one of these classifiers are, is something is still unclear. Please leave in the comments. If you want more information about these classifiers, I'll put a link to a source of good information about each of these different classifiers in description. 11. Clustering and explaining all clusterers: By now we have seen how to apply a classification to an image and what your options are when performing a supervised classification. The one thing we haven't discussed yet is how to do unsupervised classification. There can be cases where you do not have a large list of training data. And we want the algorithm to figure out different land cover classes on its own. The way unsupervised classification works is by looking at a lot of pixels and determining which pixels are most similar and which are most different. By doing this, pixels can be clustered in similarly looking groups of pixels and be classified in that way. Also here, for performing this process, many algorithms exist that approach a problem in different ways. This unsupervised classification is also often called clustering. Instead of unsupervised classification, I will refer to it as clustering from now on, since that is the name that the Google Earth Engine gives it as well. In the Google Earth Engine, a supervised classification is stored in a classifier object. This is similar for a clustering algorithm which is stored in a cluster object. So the way they are trained and implemented is very similar. For supervised classification, it was necessary to specify training data of the different classes you wanted to detect. And when performing the clustering algorithm, all steps are the same. And you also need to create training data that this data does not have to contain an attribute about which classes. I will start from the same setup I also used when introducing supervised classification, and I will do the same steps. So first I select a region of interest, fetch an image to classify this clustering. Just as a classification could also be done on an entire image collection. But for simplicity and speed, I will just use one image. When we did the supervised classification, at this point, we started to collect data about the different classes. Here we won't do it manually, but we could use a sample function to give us a 100 random points in the area. This sample function returns a feature collection with 100 points containing the spectral value of the image they are located on. Ideally, you should have a lot more training points, but I will take 100 to speed things up. Then the next step is to create a cluster object from one of the different algorithms available. I will just pick one and go over the different algorithms later in the video. This cluster object we have no, still needs to be trained. This is done by applying the train function on the cluster and bossing in the training data. And this cluster will now group training points that are most similar, while these groups are as distinct as possible from other groups of points. When we apply this cluster to our image, it will take all pixels of the image and determine to which of these groups that pixel is most similar. And when we do this, we get a result like so. Based on those 100 random samples, the clustering algorithm was able to identify five clusters on its own. We can then interpret these and determine what the clusters represent. If however, you want finer control about how many clusters there are and how it is processed. We have to take a look at the different clustering algorithms and their attributes. We're looking at the docks for clustering. You can see five different algorithms. The Earth Engine uses weaker to perform its clustering. And that's why you see it in every algorithm name we get is a free collection of tools developed in New Zealand, specifically used for data analysis and data mining. The first algorithm, and in my opinion, the easiest to understand is the K means algorithm. I will show this with an example. Let's say we have an image with two bands and the pixels of this image are distributed like this. What's K Means will do is randomly puts gay points on this distribution and calls them your cluster centers. It then looks at all the pixels to see which pixel would classify as each class. These are by far the correct values. But then it recalculates the cluster centers and does the whole process again. After a certain amount of iterations, the clusters will no longer change. And the K means algorithm reaches convergence. The class values it obtained then our best estimation of the different clusters in the image. An important aspect about this algorithm. Is that you have to know how many clusters you want to identify. And while it is easy to understand, it becomes very computationally expensive when you have a large dataset, which is kind of the thing with the Google Earth Engine. Luckily, the people at Google thought about this and there is a technique to improve performance of this algorithm. And this technique is called canopy clustering. This technique creates canopies based on distance metrics. It cannot be is basically a group. And when calculating distances, it will only take into account other points that are in the same canopy. There is a value that determines how far cannot be centers must be from each other. And there is a distance metric that determines which points are included in that cannot be. These are called the distance and the loose distance respectively. Since the loose distance is by definition larger than the tight distance, a point can be included in multiple canopies. And when the K means algorithm is running, it will only compare the distances of points within the same canopy and ignore everything else. When there are a lot of points, this significantly improves performance of the algorithm. And when you look at these canopies in the graph, it indeed looks like the canopy of a forest. I don't know the exact math behind this technique, but I'll put a link in the description to the video that helped me understand the concept. Now that we understand the workings of this algorithm, we can look at the attributes we can add to the function in the Google Earth Engine. The first attribute is the number of clusters that gay means should create. Next we have an init value that determines how these clusters are initialized. By default, the initial location of the clusters are chosen randomly. The k-means plus plus option chooses the location based on the distribution of the points and often reduces the amount of iterations needed to reach convergence. You can also choose to start with the canopy centers as cluster centers, or choose the points that are furthest from each other to use as initial cluster centers. The next two attributes are red, are canopy should be used and how many there can maximally be retained in memory. Periodic pruning will determine how often small cannot be, should be removed. And the Min density determines what is considered a small cannot be. Next we have the values T1 and T2. Our that tights and lose distances mentioned earlier. All of these attributes might look intimidating, but are only important when you choose to enable canopies. The distance function will determine which distance metric is used when k means calculates distances between points, and you can choose between Euclidean distances or Manhattan distances. The max iterations attribute determines an upper limit of how many iterations the algorithm should run. If it did not yet find convergence, it was stopped after this number of iterations. I have no idea what preserve order does. So let me know if you have any idea. Then the first option also improves performance by reducing the accuracy of the numbers it works with. But this disables the option to get error matrices as described in the docs. And then finally we have a seat attributes. And this attribute is used to reproduce the algorithm. Even though it relies on random values. When using the same seed, the random values will stay the same when you run the algorithm again. Okay, This was a long explanation, but two of the other algorithms have the same foundation as the game means. The cascade k-means algorithm, for example, is the same as the K means algorithm, but it iterates over different amounts of clusters and determines which cluster count has the least dispersion within a cluster while having most dispersion between clusters. So it automatically finds the best number of clusters. This is calculated using a special criterion that you can see in the docs, which is also called the variance ratio criteria. The arguments are therefore quite self-explanatory. You can choose the minimum and maximum number of clusters. It should consider how often the algorithm should we start, what the starting positions of the cluster centers should be, and so on. Then the x means algorithm is again the same as the k-means algorithm and also implements a technique to automatically find a more ideal number of clusters. It does this by splitting the most promising clusters. And see if clustering with the splitted clusters is better compared to clustering when the clusters we're not splitting. Therefore, in the docs, we can again choose the minimum number of clusters. This will be the initial number of clusters it will start with, and the maximum number of clusters, and it will no longer Split clusters authored, this amount is reached. Next, you can choose the maximum amount of iterations for the entire algorithm, both for when it compares the k-means clustering without splits. And for the k-means clustering performed on the Split clusters. A KD tree is not a very complicated concept, but I cannot find the effect of this attribute on the x means algorithm. I assume it improves performance at a cost of precision, but I'm not sure. So I'll put a link to a good resource in a description and then leave it like that. Also the cutoff vector. I played a lot with it, but I cannot make it have any effect on the clustering output. And at last we have a seat to reproduce the results. Next we have two more algorithms that are not directly related to k-means. Let's start with the cobweb algorithm. The complex algorithm organizes the different training points into a classification tree where each leaf of the tree or a note is a class. When this tree is constructed, it can be used to classify all other pixels in the image to determine to which nodes are clause. It corresponds to the most. There are two properties you can enter that alter the way this classification tree is created. When you increase acuity, notes are less likely to split and the result is a tree with less notes and thus less glosses. The cutoff value determines the minimal category utility or category goodness. Increasing this value will require that the split of a node contributes more improvement before it accepts the split. Increasing this value also results in fewer split branches and thus less classes. The final algorithm that the Google Earth Engine offers is the learning vector quantization algorithm. This is a neural network-based approach, but it does have a similar strategy as the k-means algorithm. Each iteration, it loops over all training points and it pulls the closest gloss center to its location. The amount by which disclose center is pulled depends on the learning rate. And when looking at the possible arguments, we again see the number of clusters we want and the learning rate that I just mentioned. It is possible to determine the number of training epochs. This is simply the name four iterations in a Machine Learning context. And it determines how many times it should perform this loop of pulling gloss centers. And then you can choose to normalize your inputs. If you have a band that ranges from 0 to one and another band that ranges from 0 to a 100. For example, you should normalize these values so that the algorithm can compare them. I assume that since bands from satellite images mostly have the same range, this is disabled by default. So that was quickly how our clustering algorithms work in the Google Earth Engine. I hope it is now a bit more clear why there are these different algorithms. Have fun and try out the different options to find out which one suits best for your application.