AWS serverless analytics - Creating a data lake using S3, Glue, Athena and Lambda | Engineering Tech | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

AWS serverless analytics - Creating a data lake using S3, Glue, Athena and Lambda

teacher avatar Engineering Tech, Big Data, Cloud and AI Solution Architec

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

6 Lessons (50m)
    • 1. Introduction

    • 2. Creating a Data Lake with AWS S3

    • 3. A data catalog with AWS Glue

    • 4. Querying data using Amazon Athena

    • 5. Running Spark transformation jobs using AWS Glue

    • 6. An automated data pipeline using Lambda, S3 and Glue

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

In this class you will learn how to create a serverless analytics solutions on Amazon Web Services (AWS). Following topics will be covered

1. Creating a data lake using S3

2. Creating a data catalog using Glue

3. Extraction, transformation and loading data using Glue and Apache Spark

4. Viewing data using the Athena query tool.

5. Creating an automated data pipeline using Lambda

Students should have some understanding of AWS and Big Data before starting on this course.

Meet Your Teacher

Teacher Profile Image

Engineering Tech

Big Data, Cloud and AI Solution Architec


Hello, I'm Engineering.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Introduction: Welcome to this a1 plus serverless analytic solution course. In this course, you learn how to create a data pipeline using get-up list S3, Lou, and Athena will create a data lake using getter places three. Then we'll run Spark jobs on adopters, nuclear extract data from the S3 bucket. Then we'll apply transformation using spark and stored the data in another bucket. Will also be creating a data catalog using get a blue color service. And at the end we are going to automate the entire data pipeline using AWS lambda. This is a completely hands-on course. You will find step-by-step instructions to create a data lake solution on AWS. Before starting on with this course, you should have some prior knowledge in AWS and also high level understanding of big data solutions. So let's dive in and get started. 2. Creating a Data Lake with AWS S3: Now we'll create a data lake using AWS S3 or simple storage service such forestry. Select that. And this will take you to edit blesses three landing page. And the way AWS S3 works is pretty simple. You create a bucket and store your files under that bucket. Let's create a bucket. Cleared bucket. And we'll call it effects skill. Data Lake. And the default region gets selected. Let's uncheck this. When lard blog public axis would be accessing this bucket from outside. And we'll keep everything else as default. And click cleared bucket. And let's acknowledge that we have turned up public axis. Now the bucket has been created and we can start uploading files to it. Let's upload a file from the machine. Click on art files, will select a bank prospects file, and then click Upload. And the file has been uploaded. Let's click on this file. And there we can see file properties. And we also get a S3 object URL using which we can access it from outside. Open it in a new tab. And it says access denied. Let's go to Permissions. And we let it this everyone's didn't public should have read access to this. Let's save changes. We have to accept that we are making this public save changes. Now let's go back and refresh this page. And we are able to download the file. So this is how you can create a data lake. So this is a simple file we've uploaded, but this will work for whatever number of files you upload. You can store petabytes of great INA bucket and use it for your big data analytics. And let's understand AWS S3 pricing. The cost is $0.023 gigabytes of data. For the first stop repeat terabytes. You also get charged for the number of times to S3 bucket is getting accessed. Under the AWS. You can store up to five GB of data in the S3 bucket without incurring any additional judges. And you also get charged for number of files that are getting accessed. Crop could produce Italian because there'll be no judge. One additional thing is your bucket should be globally unique since we access it through a URL that has to be a globally unique address. And if you tried to clear the bucket name with a name that already exists, then it will not allow you. 3. A data catalog with AWS Glue: In the previous lab, we created a data lake on AWS S3. Now let's understand how to create a data catalog from the data stored in the data lake. Using a data catalog, we can read the schema information or data that is stored in any data source. Aws provides something called AWS glue, using which you can easily crawl up particular datastore and create a data catalog. Aws, three, glue, these are all serverless services. That means you can use these services without worrying about the underlying infrastructure. Aws takes care up maintenance, patching up the services, and you pay for the uses tame, or you pay for the storage time. In case of AWS S3. Let's understand how to create a data catalog using it of list Glu. Before that, you need to understand another important concept called VPC and UPC endpoint. Within AWS, customers can create their own virtual private cloud. We have an isolated environment. For example, a user might want to secure their Ws three within a virtual private cloud. Then in that case, let's understand how AWS glucagon axis, AWS S3 in the most efficient manner. Head up lives glue can always raise their WCS three through the public URL. But that's not the most secure way. And also glue needs to go out of network and then come back to S3, which is also not the most efficient way. There is another option of having a VPC and port through which AWS glucagon directly talk to AWS S3 using this VPC endpoint here to bless glue can access S3 to the private IP addresses. No exposure to internet. Let's first create a VPC endpoint for AWS S3, and then we'll jump into creating the data catalog with AWS glue. Let's search for VPC or virtual private cloud. Then click on end points on the leptin panel. It says you do not have any end point. Let's create one. Now we need to select for which service you want to create the endpoint search for S3, X3, and hit Enter. Now select AWS S3. After that you see a virtual private cloud VPC dropdown here. Aws has created a default VPC for our account and we'll be using that. We can always create new VPs. But for the time being, let's use the default one. Under the VPC, we let multiple subnets are subnetworks. Let's select this option and then click Create endpoint. Now it will be PC endpoint created. Let's close it. We can always click on the VPC and want to learn more about it. Let's open console in another window. And we'll search for blue. Glue is a fully managed service for ETL, the extraction, transformation and loading of data. In this lab we'll explore blue color feature using which we can crawl it dataset and create a Schema. Click on crawlers link on the left-hand side and our color. Let's give the crawler and name, call it fx crawler. Click Next. Colored source tape would be datastores are existing catalog tables. So we'll select data stores because we are good to read data stored in AWS S3. And this is a repeat grow option for S3 and we'll keep the default option, that is crawl all folders. Click next. And our Datastore will be S3. Here will specify connection from GLUT-2 a Douglas S3 will leverage the VPC endpoint that we created a layer to make a connection. Glick Bernard connection. We'll call it a fixed coin three. And it says Choose VPC, we'll choose our default VPC and choose subnet. Select any of these subnets. And in the security group, select the group ID that is getting displayed here. Now click OK. And let's select that connection. Now you can choose to select a part in your own AWS account, or you can select S3 part in another account. We'll lose our own account. Click here. Select there. Get a fixed scale data lake that we created earlier. So we are going to crawl the file that is stored in this bucket. A crawler can crawl multiple data stores for now will crawl on leader data store that we specified in the previous screen. Select New and click Next. Now we need to create them, Ma'am, role. Iam stands for identity access management, using which you create. Roles, give permissions to different users and different services. Glue needs to be given permission to read data from S3. Here we'll select Create an ambrosia. And the name would be AWS, glues service role, and affects contour. You can specify any name here. And then click Next. You can run the crawler isn't when we want our scheduling. Let's run it on demand. Click Next. Mother colored needs sit database to store the data catalog. Let's create a new database. We'll call it a fixed DB and click Create. And you can also give prefix to their tables. That is optional. Let's proceed. Let's click Next. Now the qualities ready? Click finish. Next. We can run the color. Let's run it and see what happens. Select the color and click run chlorella. You'll get a message that it's attempting to run the color and the status will soon change. Now it is starting. It takes about a minute to crawl the data. Now the status is changed. And it ran for about six minutes. And here it says Table saddled one. Let it finish and then we'll check out the table. Now the prologue has finished, and we got a message that it is completed and with the falling changes, one table created, there is an option to see the log. Let's click on that. And when we click on that, it takes us to another service called CloudWatch. Using CloudWatch, you can track logs for AWS services. And by default, AWS has integration of CloudWatch with many services like AWS glue. We can see the log of our execution. It started at certain time. Then we can see that it created a table FX skill Data Lake in the database we specified. And then it finished. You can get more details from this log. In case of any error, you can come here and understand more about what went wrong. Let's go back to the glue interface. Now to know the table it created, you can go to the databases link or you can directly click on tables. Databases would show you all the databases that you have. We created a database called FXB. Let's click on that. And under that, we can see all the tables. Let's click on the link for tables in it. And you can also directly go to the tables link here. And it created a table called a fixed skill underscore data lake, the glue crawler we created Look data bucket on AWS S3 and created it table. Let's go to AWS S3 now. Search for S3 or it can pick on S3 from the recently visited services link. And we have a bucket, a fixed scale data lake and lambda that we were banned prospect CSV file glue color looked at this bucket and created a table called a fixed kill underscore data lake, which you can see here. Let's click on this table. And we can get some additional information. And at the end we'll see the schema. So the colored looked at the file and created this schema. Let's open the CSV file. So this is the CSV file we had in dire AWS S3 Data Lake, it has five fields, is salary gender country purchased? The glue crawler was able to crawl this dataset and create the schema based on the data available in each column, we decided whether to give int or string type to the column data type. We can see that agent salary Arab begin type because they had numeric data. And the other three fields have been given string data type. This is our Eugene get-up list glue Crawler. You can create a schema or catalog to omega dash tool. Let's understand AWS glue pricing. Glue has different features. We tried to correlate. Let's pick on that. So you get charged on an hourly basis based on the number of CPUs or data processing units used to run the crawler. Since we run a very small dataset, the cost will not be more than a center. We can also take the pricing for Data Catalog, storage and request. And it is free for first million objects. And after that you pay $1.1 per 1000 objects. 4. Querying data using Amazon Athena: Next, let's understand AWS Athena, which is equating tool using which we can query data stored in AWS S3 data lake. Behind the scenes, AWS Athena QGIS to glue catalog to query the data. So let's see how that works. We will first go to AWS glue. Using the glue correlate, we created a table effects can get a leg for the CSV file stored in our S3 bucket. Let's see how we can view the data using good publicity. Now, select the table and pick an axon and view data. And you get a message that you will be taken to a thinner tube reboot data and you will be charged separately Palatina queries. We'll come back to the pricing of 89. Let's now preview the data. This is the main Athena interface and you'll get it text box to write the queries. And on the left-hand side you can select data source and databases. Noblest glue catalog is connected to a tina by default, will select a fixed db, which is the database we created for the glue crawler. And we can see the table here. Let's add a query to query that table. We'll say select star from here, fixed, kill underscore data lake. Let's run the query. Regarding the error that a fixed query output bucket is not available. With earlier specified that this bucket will be used to store it in output data. And if you're trying it enough for the first time, you'll be prompted to create a bucket. And we're created a bucket or layer which has now been deleted. That's where the center is coming. We will go ahead and recreate that bucket. You can also create a bucket with any name. We will go to S3 and create this bucket. Fixed query output. Keep everything as default and hit create bucket. And we'll run it again. So this time it ran fine. Now you can braid to whatever query and get insight from the data. Let's get all customers with age greater than 25. We can see the output here. This data resides in a s3 bucket and the, the'90s using the glue catalog to know the schema. And then it is providing an interface using which you can query the data. This is our glue and antenna can be used to query data stored in an AWS S3 data lake. Let's tick Athena pricing. Aws Athena charges you for the amount of data you scanned in each query. And there'll be charged what medieval, ten MB, regardless of the size you choose. We run a very small file and will be charged for ten MB of data scan. The charges should not exceed a few cents. 5. Running Spark transformation jobs using AWS Glue: Till now we have seen how to create a data lake using AWS S3, and then create a data catalog Eugene get ablaze glue. And finally how to query the radiogenic publicity. Now, now let's understand how we can do data transformation using Spark on AWS glue. Spark is a popular with dram technology, prototype cleansing, data processing and data transformation. Spark can connect to multiple data sources like Hadoop where Douglass S3 file system, NoSQL databases. In this lab we'll see how we can run Spark on AWS glue and then transform the data that is stored on AWS S3, and then store the data in another bucket, AWS S3. So let's dive. It will first go to AWS S3. And we'll create a bucket where we'll store the transform file. We'll call it a fixed transformed. Let's give everything as default and create bucket. Next door will open console in another tab. And then we'll search for blue and go to the glue interface. Click on crawlers. Let's create a new color for the bucket that we just created. We'll call it transformed crawler. Click next. Datastores, crawl all folders. And then we'll use an existing connection. And this time we'll choose the transform bucket. Click select, click Next, and other industrial click Next. And let's create a new I am role demo for. And we'll run it on demand. We'll choose the existing database. And click Next. Let's finish. Now we'll run the crawler. The Carolinas finish. Now. If we go to tables, we want to see anything because there is no data in the Transform table. Let's now create a glue EPL job could do the data transformation. Pecan jobs. Click on our job, will name it as effects transformer. We'll choose B-mode three roll that we just created, but report that we need to make one change. Let's go to the I am interface, will open console in another tab. Search for identity access management or I am. Using diam service. You can create users create roles for your AWS account. And also you can give permissions to different roles and different users. This is diam interface on the leptin panel, we'll click on rules. Let's search for the new role that we just created. Demo tree will select demo for, that's the one we just created. We can see that it has a policy, it has AWS glue service rule. Let's also ensure this has full access to S3 to read and write data. We'll click on attached policies. And search for S3 will give via Douglas's threefold axis. Now it has the policy. So we can see that this demo for role has two policies attached, S3, full axes and glue service. Strolling. Back into glue interface will select demo for diane rule. Using glue, you can write Spark streaming Python Shell jobs. For this demo we will use Spark. And then you can choose either Python or Scala. Lets choose Python will choose Spark 2.4. Python three has done language was part, and we let glue create a script for us and then we'll modify that. And this clip neighborhoods chosen by default and glue as decided where to store the file and where to create a temporary directory to store the intermediate data. The next click on security configuration. And here, one artery would continue to pay attention to is how many workers you would need. For this demo to workers should be sufficient though you can try with the higher number of workers. With more number of workers, stop processing time would be much faster, but for the small dataset to occur should be sufficient. Let's leave everything else as default and click Next. Now we need to choose your data source will choose the fixed killed data lake, that is our source. And then we could choose a transformation type, will keep the default option is seen schema. And then we need to choose a target. It's asking for options to select a target table. Will create a table in our raw data target. Let's select that option. And here you can choose a JDBC option and select different data sources, or you can select as three and store that their dynasty bucket. Let's choose the industrial resistor, and we'll choose the typeface CSV only. You can choose any of these derivatives. And then we lose your fixed quantity. That is the connection that were created earlier. And the target pattern would be a fixed transform bucket that we had created for the transform data file. Click next. At this point, glued show you the mapping. By default it is kept all the fields. It does shorts datastore. For the target also it is specified the same fields. You can modify, you can remove the ordering. For now let's keep everything as default and then save job and deadly descriptor. So this is the glue interface to modify the ETL script. I'm going to select a particular data source. It highlights the code. That has been generated for the input data source mapping. And when you click on transformed, it shows the transformation rule. For us, there was no specific cosmos or dual other than the mapping and that is getting highlighted here. And finally, it shows the target where the data would be stored. Will apply very simple transformation to demonstrate how you can modify this script. Let's select the transformation code block. And just before that, will add the new transformation logic. And glue creates a dynamic frame. And then that can be converted to a normal Spark DataFrame. And then you can use Spark libraries to do all kinds of data processing. Or you can use glue spar, dynamic frame libraries to do the transformation. For this demo, we'll convert this to a Spark DataFrame and then do all the processing. Will convert this data source 0 to a DataFrame using the toDF method. So dynamic frame is a toDF method using which you can convert a dynamic frame quiz target iframe. Once that is done, you can apply any data transformation using Spark libraries. We are not going into details up ApacheSpark in this course. If you are interested in learning about PySpark or sparks color, you can check out our other courses. In this lab, we'll do a simple demo of Spark transformation to showcase dot capabilities of AWS glue and ApacheSpark enablers and Mama. This is our source dataset. And we can see that there is one row where the country field is unknown value. Using this line will filter out any rows. That is country value is unknown. And then after that, we'll convert this DataFrame back to a dynamic frame and then use that to write to the target table. And the syntax to convert a dynamic frame to a DataFrame is this. Use the dynamic frame from DF method passive data frame and you get a dynamic frame back. And we need to import dynamic frame libraries also. Let's import dynamic RAM class from AWS glue dynamic libraries. And this dynamic frame we need to pass to the next tape where we're doing the mapping, not data source 0. So these are the three lines we have added audio data source 0 was used for the mapping, but now we have done some transformation and this is the new dynamic frame which you need to use in the subsequent steps. Let's save it. We can run it from here also. Let's close it and run it from the main job interface. Now select the job and click run job. And while it is running, you can click on the job and go to the History tab and then check out the status. It's currently running. And here you see details about the job. And you can click on this script end dedicate again, and then come back and run again. It is succeeded. Let's go to S3 and then check the transform bucket. We can see that it has generated file. And let's click on it. And we'll click on the objective item. It says permission denied. Let's go back and give permission to this file. Click under transform bucket. And then click permission. We'll remove this block called Public Access. Save changes. And let's confirm it. After that click on objects tab again. Select the file, will take the permission. Liquid, it will give everyone public axis. So at bucket level we allow public axis. And then for the specific object, we allowed grow public axis changes. Now click on the object URL. And it downloaded that file. Let's check it out. And this is our transform failure and we can see that the last row with unknown value has been deleted. Back prospects was our original failed. And this is the new file after data transformation. Let's go to dog glue crawlers and run the classroom color. To describe it is pointed without putting s3 bucket. And we should see a new table getting generated with the material, the information of dark transform data. So the colon is now finished. Let's click on tables. We can see a new table if x transform, let's click on that. This is the new catalog for the transform data and it is same as the source data because we have not modified any fields. And we can view the data for this table the way we did earlier. Select the table and click view data. This will take you to the 89 interface. And you can query the table the way you did earlier. So we can see the transform data here. Let's check the glue pricing for the ideal jobs. So glue TARDIS you on the basis of data processing units and the price for one dp is 0.4 per dollar. You can read more about how the pricing is calculated. For the job that we just tried. The charges should be a few cents. 6. An automated data pipeline using Lambda, S3 and Glue: We have seen how to create a data catalog using AWS, S3, glue, and T. Now, these are all serverless services. You pay for the storage, you pay for the uses time, but you do not need to worry about the underlying infrastructure. Let's now look at another serverless service called AWS Lambda, using which you can write code without worrying about the underlying infrastructure. Aws lambda can be used for multiple use cases. Let's understand how we can trigger the AWS glue job as soon as a file is uploaded to a publicist tree. Let's go to AWS console and search for lambda. Is the description says you can write and execute Corbett Lambda without worrying about the server or underlying infrastructure. With lambda, we can create functions, regard them manually, or triggered them on an event. Let's a function. Let's select author from scratch. And we'll give the function and mmm. Let's call it a fixed lambda function. You can write a lambda function using various languages such as Java, Python, Ruby. Node.js. For this lab will select Python 3.7. And then we'll select a roller. Let's select created new role and lambda would create a role for us. We can also go to IM interface and created role and use it here. For now let lambda greater role for us. Leave everything as default under the advanced settings and click Create function. It takes a few seconds to create the function. Now we can see a misses that the function has been created successfully. And if we scroll down, we'll see a section to write the function code. Select the Python file that lambda is generated. And you will see a function which got generated. And within that function we can write our code. And this function will do some operation and return the status. And when everything is success pulled, it returns status code 200, which stands for success. We can see that the default method returns hello from lambda, and this can be modified. You can taste the Lambda function within the lambda interface before triggering it from outside. Select, configure, test events. Send data to the Lambda function in JSON format. For now let's remove all keys and values and send a blank JSON file. Will have to give the taste event ID name. Once the event is created, we can select dated that top part drop-down and click on the test Wharton to taste the Lambda function. Let's click and see what happens. We can see the ocean preserve status code 200. End the message Hello from lambda. If we scroll up on this interface, you'll find various taps to get more details about the Lambda function. Let's click on monitoring. And here you will see all the metrics for the lambda function. Metrics such as how many times the function is getting executed, how many times there was success and how many times the function failed. And since we are triggered it once, it showing it here, and then it was a success. And it also shows the average duration of function execution. And you will find various other metrics here. You'll find a link at the top, or to go to CloudWatch to view the function logs. Click on that. It takes you directly to the AWS Lambda Log section. You can see the function name here. And under log streams, you'll find all the log files for that function. And you can select a particular log file and see more details. It shows when the function started and when it ended, and if there are any print statements, those will appear here. We are back in the main lambda interface. And you can click on functions and then select a particular function and then go back to the page which shows functions source code. Now we'll make some changes to the Lambda function. We'll add a simple print statement. And then we need to deploy the function. Once deployment is over, you can click test to taste it again. Let's check the execution preserved. We got 200 status code and hello prime lambda. And we can also see future xs lambda function print statement output. Let's now go to CloudWatch and see the latest log. Sometimes you'll find the login the same file. And sometimes the Lord get generated in a different file that depends on the interval between two execution will go to the main log URL and then select the latest log stream. We can see the print statement output here, future x lambda function. Let's now understand how to trigger this lambda function on an S3 up Lord event. Back in the main lambda user interface, you'll find a section where you can art different triggers to a Lambda function. Click on our triggered. Next when configured that triggered. And we'll select the AWS, S3, the conduct. Next we'll select a bucket. And we'll select a fixed scale data like bucket. And then even tape would be all objects create events. Whenever we are blurred, any object to this particular bucket, the event will get triggered and keep everything else as default. And then we can see that is three triggered has been added to this lambda function. Let's now go to S3 in another window. Search for S3. Select this tree. Now we'll select the effects skilled data like bucket. This bucket has a file which we uploaded our earlier. Next, we'll upload another file and see if lambda function is getting triggered. When we upload a new file. Let's create a copy of band prospects file. We'll call it band prospects to. Now let's applaud. Click on art files and select the file and upload. Applause. We can see the new file band prospects to. Let's now take the cloud watch logs, select the Log groups, and select the log for lambda function. Then click on the latest log file. In the log, we can see that this function got triggered place. Second aim for the S3 file upload event. Let's now upload another file to the same S3 bucket. We'll call it band prospects tree. Click upload files, select the target file. And upload. Upload succeeded. Let's go back to Cloud watch. It created a new lock, violets. The conduct. We can see that the function has been invoked back in the lambda min interface. Let's modify the Lambda function could trigger the glue job when a file is uploaded. Select the function. And here we'll add code could trigger here ablaze glue job. Aws provides a Python B23 SDK, using which you can interact with different services programmatically. You can search for AWS SDK, for Python, and learn more about it. For now, let's understand how to trigger a glue job from the Lambda function. First, we need to create a client for a Douglas glue using Bordeaux 3.What line function. And we can specify glue within parenthesis. So this will create a client for glue. Now this client has started job run method using which you can trigger a particular job. Let's go to the glue interface and find our job name. And our job name is a fixed transform our lives copy this name. Here. Let's capture the job name. Next will deploy this function. So first we are creating a client for blue, then using the client.stop job run method, we are triggering effects transformer Joe from this lambda function. Now whether we run it manually through tasty event or trigger it through S3. Object of Lord. This function will get executed and does start Douglas job. So this has been deployed. Let's now upload another file. We'll call it bank prospects for, will go to S3 interface and uploaded the file got uploaded that let's go to CloudWatch and see the log. So this is the latest run. Let's click on it. It says access denied an error occurred while calling start geography. So the Lambda function role that we have does not have access to trigger glue job. So let's check it out. We'll go to diam interface. The controls. You will see the role that lambda created when you create the function. Click on it. And in the attached policies will search for blue and it has no glue service role policy. Select this and attach policy. Now lambda function is access to execute blue jobs. We'll upload another file. Now. Let's go to S3 and applaud the band prospects phi file got uploaded, will go to the glue interface, the refresh this. Now you can see that it is running. We're able to trigger a glue job to the Lambda function while it is running. Let's check out the login CloudWatch. And this is the latest invocation. The punk scene has been triggered. Earlier. We got the access denied error. This time it ran fine. And the glue job is running. So while it is running, let's go to tables, click on a fixed killed data, lick and do a view data. So this is the source table that we are trying to view and will not put any limit here. Let's search for all the records. This table is reflecting data from all the file uploads, though we did not trigger the crawler. So color you need to set up only once. Every subsequent upload dot to the bucket will get automatically reflected. Will go to the jobs interface and see if the job got finished. And it has succeeded. Will go to tables. Click on effects transformed, view data. Let's query the table without any limit. We can see data from all the files. And there is no record with unknown in the Country field because that is the transform is a logic that we have to remove any record which is unknown value in the Country field. This is that we can use AWS Lambda to trigger some events based on some other events. Now we've built a data pipeline. Eta plus lambda will monitor any incoming file to AWS S3 bucket trigger a glue job. Glue will read the file, do the transformation and store it in another bucket. And then we can view the data using adolescent Tina. And all these services have serverless. We do not have to worry about the underlying infrastructure. So this is how we can build a serverless analytic solution on AWS. Let's check the cost to our AWS Lambda before we wrap it up, search where AWS Lambda pricing. So if you're trying it out under your free tier limit to you do not have to worry about it for simple executions for up to 1 million requests in a month.