Azure Data Engineering - Build Ingestion Platform | Twisted Careers | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Azure Data Engineering - Build Ingestion Platform

teacher avatar Twisted Careers, Pioneering Through Education

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

43 Lessons (5h 17m)
    • 1. Introduction

      5:05
    • 2. Introduction to ADF

      3:43
    • 3. Discuss Requirement and Technical Architecture

      2:09
    • 4. Register Free Azure Account

      4:14
    • 5. Create A Data Factory Resource

      8:39
    • 6. Create Storage Account and Upload Data

      7:32
    • 7. Create Data Lake Gen 2 Storage Account

      5:51
    • 8. Download Storage Explorer

      4:28
    • 9. Create Your First Azure Pipeline

      16:28
    • 10. Closing Remarks - Creating Your First Pipeline

      6:20
    • 11. Introduction Metadata Driven Ingestion

      3:18
    • 12. Create Active Directory User

      2:31
    • 13. Assign Contribute Role to User

      3:35
    • 14. Disable Security Defaults

      1:49
    • 15. Creating the Metadata Database

      9:30
    • 16. Install Azure Data Studio

      5:54
    • 17. Creating Metadata Tables and Stored Procs

      8:14
    • 18. Reconfigure Existing Data Factory Artifacts

      7:09
    • 19. Set Up Logic Apps for Email Notification

      9:24
    • 20. Modify the Data Factory Pipeline to Send Email Notification

      10:16
    • 21. Create Linked Service for Metadata Database and Email Dataset

      4:07
    • 22. Create Utility Pipeline to Send Email to Multiple Recipients

      14:43
    • 23. Explaining the Email Recipients Table

      5:22
    • 24. Explaining the Get Email Addresses Stored Procedure

      2:30
    • 25. Modify Pipeline to Send Email using the Utility Pipeline

      4:40
    • 26. Track Pipeline Triggered Run

      12:29
    • 27. Making Email Notifications Dynamic

      16:52
    • 28. Making Loggin of Pipelines Metadata Driven

      10:52
    • 29. Add a new way to log the main ingestion pipeline

      13:28
    • 30. Change Log Pipeline to Send Failure Messages Only

      8:06
    • 31. Create Dynamic Datasets

      11:24
    • 32. Reading from Source To Target Part 1

      8:09
    • 33. Reading from Source To Target Part 2

      12:52
    • 34. Explaining the Source To Target Stored Proc

      4:48
    • 35. Add Orchestration Pipeline Part 1

      7:10
    • 36. Add Orchestration Pipeline Part 2

      9:02
    • 37. Fixing the Duplicating of Batch Ingestions

      8:19
    • 38. Understanding the PipelineLog BatchRun Batch and SourceToTargetView

      9:33
    • 39. Understanding the Get Batch Stored Procedure

      4:59
    • 40. Understanding Set Batch Status and GetRunID

      3:33
    • 41. Setting Up an Azure DevOps Git Repository

      6:39
    • 42. Publishing the Azure Data Factory Pipelines to Azure

      8:20
    • 43. Closing Remarks - Metadata Driven Ingestion

      2:42
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

34

Students

--

Projects

About This Class

f782e958.png

The objectives of the class are to onboard you onto the Azure Data Factory platform to help you assemble your first Azure Data Factory Pipeline. Once you get a good grip of the Azure Data Factory development pattern, then it becomes easier to adopt the same pattern to onboard other sources and data sinks.

What will be covered in the class is as follows;

1. Introduction to Azure Data Factory

2. Unpack the requirements and technical architecture

3. Create an Azure Data Factory Resource

4. Create an Azure Blob Storage account

5. Create an Azure Data Lake Gen 2 Storage account

6. Learn how to use the Storage Explorer

7. Create Your First Azure Pipeline.

Meet Your Teacher

Teacher Profile Image

Twisted Careers

Pioneering Through Education

Teacher

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Have you ever wondered how data moves from one system that does something and flows to another system that does something else, but the data. Okay, Let's picture this. You are happily using your metric stamp for new, shiny new mobile phone. Watching your favorite shows and movies at a day or two later. Netflix start recommending new shows and movies, which are somewhat similar to the kind of shows and movies you have watched in the last few days. And you can't help wonder how Netflix has done this. Well, to explain how this happens, we have to acknowledge that there are two kinds of online systems that exist in the world of computing. The first system is known as online transactional processing or OLTP for short. And the other is known as online analytical processing or OLAP for short. Let's go back to our Netflix example. For every search that you make on your Netflix app is considered a transaction so that such data is stored within a database. Let's call this database Netflix db. Now, for almost every interaction with your Netflix app, such as when you are classifying a rating or when you filter by your favorite genre. Or were you simply just click and watch a show? Your interactions with the Netflix app are captured and stored in the Netflix. And the action of storing that data is referred to as a transaction because it creates new data. And therefore, the Netflix app is an online transactional system. Alright, now let's go back to online analytical processing systems, or left systems are systems that process data from transactional systems. And the intention is to apply complex algorithms and machine learning models to data. But these complex algorithms and machine learning models are then used to analyze the data to create new information about each and every customer that uses the Netflix app, such as yourselves. Netflix then uses this information to target you with recommendations on what you should watch next. And this is how they keep you hooked on Netflix. However, before you can analyze the data, you need to create a data storage platform known as a data lake. With all of these transactional data will be stored. Let's call the Data Lake as just the Netflix data link. But then how does the data get into the data lake? Now, to finally answer the question of how data flows from one system to the other. We need data integration tools. As an aim. Data integration suggests. These are tools that are designed purposely to move data from one system to another. From our example, we need a data integration tools to connect to our Netflix TV and then collect the required data and then moves the data to the metrics data leak. Now, there are plenty of tools right there in the market. But we are going to focus specifically on Azure Data Factory. Azure Data Factory is a Cloud-based data integration tool and we are calling to work through an example to give you a good sense of how data integration is done. So if you are an inspiring data enthusiast or looking to see if you can pursue a career as a DC engineer and perhaps specialize in Data Factory. Well, you're in the right place to take up this beginner class. My name is David or you can call me. And I have been a data engineer for more than 15 years and have been involved with as your data factory. In the last three years. I have been in banking, retail, government, and telecommunications, deploy various data engineering projects. So let's learn how to interpolate data in as your better factory. And I shall see you on the next one. 2. Introduction to ADF: Hello, good people and welcome back. Let's discuss what's his SEO Data Factory. And going forward, I am going to call Azure Data Factory as just ADF. Area allows us to create workflows for transforming and orchestrating data movement. So what this means in practical terms is that any data-driven organization, there will always be a need to move the data from one source system to a central storage location in what is typically known as a data lake or data that needs to be pushed to another system that will consume it for further processing. Sometimes, and depending on the requirement that data may need to be transformed, which means change to a suitable format before it lands into the Data Lake or consumed by another system. The moving of the data usually happens through strategic events depending on the business requirement. And these events may be timed. And let's say the sales data from our e-commerce system must be moved to the data lake at three AM in the morning. And this needs to happen every day at three AM. Some events may be based on certain data triggers. And let's say for every new entry into our customer table that belongs to the e-commerce system, that will trigger a data movement to consuming systems to satisfy real-time data transfer as a requirement. The management of these strategic events is known as orchestration. Now, orchestration can go a level deeper by managing dependencies of workflows and pipelines. And this, we will observe incorrect links throughout the course. In a nutshell, you can think of ADF as an ETL tool, which is short for extract, transform, and load for the Azure Cloud. And also pretty much at the US your data platform, ADF is registered as software as a service, or perhaps SaaS, which means we don't need to deploy any hardware or software, just pay for what we use. And quite often, ADF is referred to as a code free ETL as a service. Now, let's go over the operations that are associated with ADF. The first one is ingest, which allows us to collect data and load it into the Azure Data Platform Storage or any other target location. The second one is control flow, which allows us to design code free extraction and loading. The third one is Dataflow, which allows us to design code free data transformations. And the fourth one is shed you, which allows us to schedule our ETL jobs. And the last one is monitor, which allows us to monitor our ETL jobs. Right? This is enough theory and let's get together in the next lesson and create our first as your data factory job. Is it for me? And I shall catch you on the next one. Goodbye. 3. Discuss Requirement and Technical Architecture: Hello and welcome back. Let's discuss the requirement and technical architecture. Well, to implement the requirement, we are required to source online sales data from our online web store and then interest that data into a data lake. The Web Development Division happens to store their online sales data within Azure Blob Storage. And our task as developers from the data engineering division is to create an Azure Data Factory pipeline that sauces that data from the as your Blob Storage. The pipeline will require a linked service that will connect to the Azure Blob Storage and also be able to read the online sales file, which happens to be in JSON format. The linked service will also allow us to create a JSON based dataset within as your data factory. And the dataset will serve as a data source to the pipeline. The pipeline will also require a Sink Dataset. And a sink is effectively a destination Datastore, or perhaps where we store our output. We will store the sink dataset in an Azure Data Lake Gen2 storage. And this is an object that we will need to create ourselves. In order to communicate with the Azure Data Lake Gen2 storage, we will require a linked service and we will also use it to create our Sink Dataset in JSON format. So once we have established for the source and sink, will require a copy data transformation activity that will read the source dataset and then try to the source contents into the sink dataset that is located in the data lake. Alright, now that we have a good understanding on what we need to achieve this, get our hands dirty, and let's do this. 4. Register Free Azure Account: Hello and welcome back. Let's register a new Microsoft as your free account. Once you go right ahead and loaded the URL, us, your dot Microsoft.com, and it should take you to the page that you see on screen. Now, Microsoft Azure offers a free account and that should be sufficient for the course. Ad is valid for 12 months. And in addition to the tough months, microsoft will crunch you credit with about 200 US dollar to use within the first 30 days. So let's select free account on the top right corner of the screen, which is this one. Now, if you do have an existing account, be siloxane on the top right corner of the screen. If you do not have an account, please select Start free and let me select well, start tree. Now, on this particular page, you are asked to provide your sign-in credentials, and that will be a registered Microsoft e-mail account. If you don't have a Microsoft e-mail account, please create one and then come back to this page and sign in. So I am going to enter my e-mail address to sign in, and then I will click Next. So let me enter my e-mail address. And I'm going to study next. And now I will need to enter my password. And I'm going to click Sign In. Now on this page, you are required to enter your personal details. And as you can see, my personal details have been populated automatically. If you don't have your personal details populated, bees and shore to capture your details and more expression. A correct mobile number, which will be used for authentication. So as for me, I just need to enter my mobile number. And once I am done, I am simply just going to select Next. In fact, I need to accept the license first and then proceed to select next. Alright, and now you need to do your identity verification using your supplied mobile phone number. And you can choose to do the verification via text or you can choose to get an automated phone call. I am going to choose texts, so text me and you select that. And a few moments later, you should receive a verification code, either via text or a phone call. And please enter your verification code within the text box and select Verify Code. So let me enter my verification code. And I will choose Verify Code. Now, you are going to have to do one final authentication via a credit card. Microsoft will charge a tiny amount of not more than two US dollars. However, Microsoft will immediately return the amount back to your credit card, and thereafter, Microsoft will not charge your credit card as long as you're on a free plan. Microsoft does this form of authentication to be 100% sure that you're a real person and not some malicious. But, and so please apply your credit card details. And once you're done, select the Sign Up button. So let me enter my credit card details. And once I am done, I am just going to select Sign up, just like that. And once you have completed the sign-up process or website, should take you to the main landing page. All right, so this is the end of our lesson. Let's meet again to do another one. So goodbye. 5. Create A Data Factory Resource: Hello good people and welcome back. This creating new Data Factory resource for our data integration activities. So I would like you to visit portal dot azure.com and ensure to supply your user credentials that you created. When do we registration? Now to create a new resource, you have a number of options. You can either use the option to, well, to click on, Create a resource that is under the Services section. Or you can select menu and click resource. And I am going to use the menu instead. So to select menu, and I will select, Create a resource. Now, what I like about this particular page is that the resources that you can utilize are clearly categorized. And that makes it easy to navigate what we are looking for. And it will show other associated items that may assist on integration activities. Of course, you can use the search box if you desire and just type Data Factory to search the marketplace. However, I prefer to use the categories venue. So I am going to navigate to integration right here, and I will select Data Factory. Now, on this particular page, we are presented with a factory bleed. And on this blade, we are required to enter the details that will help us create a Data Factory resource that is based on our requirement. First, we will need to select a subscription, and my free trial subscription is selected automatically. If you prefer to use another subscription, then please feel free to select that one. So I am going to leave a free trial selected by default. Now, we need to create a resource group and we will observe a recommended naming convention from Microsoft. Now to create a new resource group, I'm simply just going to select Create New. And I am going to name it our g dash. Dash or perhaps hyphen. And I will name it to data ingestion dash and 000 001, and I am CoV-2 select. Okay, perfect. So let's interpret the naming convention. Rg stands for Resource Group, which is a prefix for our resource group. Then followed by d0, which stands for data engineering. And data engineering refers to our hypothetical company division. But then this is followed by data ingestion, which is essentially the purpose of all the resources that fall under this resource group. A div reflects the stage of development. Which can also be for quality assurance and put for production. And now 0, 0, 1 refers to the instance number. Some organizations may have more than one instance for the same resources depending on their requirement. All right, you can refer to the documentation, offer the naming conventions so that you see on screen and you can use it for your own projects. Now, it is always a good idea to select the region that is closest to you. And mine happens to be South Africa. So let's select sort African naught. Just like that. Now, let's insert a Data Factory name and I'm going to call my Data Factory or ADF. Hyphen, data ingestion. Div, hyphen again, 0, 0, 1. Just like that. Now, a Data Factory name must be unique across the entire combined Cloud regions unfortunately. So I urge you to insert maybe your initials or company name to the end to make it unique or simply add an instance of perhaps two or three and so on. Now we are working on the liters version of the Data Factory instance. So let's leave aversion to as at ease. And let's select Next to continue to get configuration. Now, we are not going to be creating any Git configuration for version control at the stages. So we can just select Configure liter. So let me select Configure later, just like that. And let me select Next to move over to networking. And under managed a virtual network. We don't have any established a virtual networks, so we can just leave a public endpoint as it is. And it's a net advanced instead. So select non-clients right here. And under advanced, we don't need any encryption at the stage. So this just select next four tags. Now, tags, tags are very important when it's a good Ising resources. And also to make it clear which resources belongs to who. And it makes it convenient when assessing healing. So let's make it a habit to insert tags for every resource that we are going to create. Now to start with this one, I am going to establish a tag off division just like that. And I'm going to set this resource under data engineering. Now. I am going to set a product owner, and that will be Mr. Smith, just like that. And I am also going to set a data owner. And we can set that to miss em. Maharaja. Just like that. And let's click Review and create just like that. And this should trigger a validation. Now, it looks like the validation has passed a soul. Let's click Create, which is this button right here. And the initialization and deployment should begin. And this could take a few seconds or even minutes for that matter. Alright, now that our deployment has completed this, then go to the resource. So let me just select Go to resource right here. And on the Data Factory resource properties page, we can choose to add the data factory resource or to a dashboard as a convenient way to access it later. So let me pin to dashboard like that. And I'm going to create a shared Dashboard rather. In fact, let me just select Create New. And I'm going to name my dashboard Data Engineering. And this is a given instance. Let me just keep it like that. Our leaves a scription as it is as free trial. And the resource group location will be South Africa North because that's closest to me. So I am going to select, create and pin. All right, this is enough for one lesson, so let's get back together in the next lesson to create storage accounts that will hold our source data. So this is it for me and I shall catch you on Linux one. Goodbye. 6. Create Storage Account and Upload Data: Hello and welcome back. Let's create storage accounts and upload our data. I would like you to download the attached file for our e-commerce web store system. And once you have completed your Dundalk, I would like you to extract the zip file. So let me navigate to my downloads folder, which is this one. And I am going to right-click on online sales dot zip. And I am going to choose Extract all, and extract again. Okay, As you can see, I have got my online sales dot JSON file extract had successfully. Okay, Let's head back to the Azure portal and then from the menu we can find storage accounts are. So let me head back. And from the menu right here, I am going to find storage accounts just like that. Now, from this particular page, I can just choose to create a storage account. Now from this particular blade, we can start specifying the properties to create our storage account. And as usual, my subscription is set to free trial, which is fine by me. Let's set a new resource group. Now to create a new resource group, I am just simply just going to select Create New. And I will name it RG hyphen with the verb movement. Hyphen again, div, I offend again and one, and then select, Okay, Just like that. So now we have created a new resource group. Simply because this resource that we are about to create does not necessarily belong to the Data Engineering Division in our hypothetical company. So the division that this resource belongs to is web development. In reality, this resource would have been created by someone from the web development division. Now, let's set a storage account name. And I'm going to call it as he. Whips store data 0, 0, 1, and ensure to select the location that is closest to you. I am going to leave the performance standard and stores V2. General-purpose V2 is also fine and are the replication is also fine as well as a Redundant Storage. So let me select Networking. Okay, under networking, we really don't need anything here, so let me select Next. And also under data protection, I am going to select advance because we don't need anything here as well. Okay. Under the Security section, thus, secure transfer file option is just fine. We don't need encryption at this stage and we don't want to public access to our blob files. So let's uncheck this, which is. Set of two disabled, just like that. The Transport Layer Security version is at 1.2, which is also fine, okay? And since we don't usually perform analytics directly on source systems, we don't need a data analytics capabilities such as hierarchical namespaces, which gives us the ability to structure the collection of files or perhaps objects into a hierarchy of directories. So let's not enable Data Lake Storage Gen2. And the access tier is also fine as just hot. Since we expect the data X6 frequently. And we don't need additional support for large files, such as tables and queues. So let's rather select next four tags. So once more, let's set our tanks and start our division. And for this division will be web app development, just like that. And I also wanted to sit a product owner. And at this time around, I am going to set it to Dr. B. Just like that. And the data owner, I am going to set it to Mrs. LM, funder Minerva. Just like that. And once I am done, I am going to select, Review, and create just like that. And the validations should start. And once the validations have passed, I am going to set it to create, well, to create the resource. And this could take a couple of seconds to minutes to complete. Okay, once the deployment has completed, I am going to select Go to resource. Now on this properties page, what I want to do is to attach my storage account to a dashboard once more. So I'm going to select Pin to dashboard. And I am going to select an existing dashboard which is under shared. And I need to select data engineering like that. And I will click or perhaps a hunch pin, pin to dashboard. Let's now create a container to store our online sales data. So what I am going to do is to select containers right there. And from here, I'm going to add a new container. And I'm going to call this container cells. And I'm going to tick create. Just like that. Now what we can do from here is to upload our data. So I am going to select cells once again. And from here I am going to choose Upload. Now on this particular pop up window on the side, or perhaps another plate, I am going to choose the file to upload. And it's within online sales like that. And I am going to select the JSON file and then click open. Okay, now I am going to click Upload. Just like that. Now it is considerably a big file. It should take a couple of minutes to complete. Okay, so once the file has been uploaded successfully, what this means is that we have completed our lesson. So let's meet again in the next lesson to create our chin to Data Lake Storage. So too does it for me and a shell catch you on the next one. Goodbye. 7. Create Data Lake Gen 2 Storage Account: Hello and welcome back. Let's create a data lake generation to storage account. We will be using the storage account as a central data repository or perhaps a data leak for our hypothetical online system. So what we can do is to select menu and it's fine storage account, which should be right here. And what I can't do from this page, I will select New. And let's create our storage account. So let's apply our resource group that we are going to use, which is the data ingestion one. And let's name our storage as d for data engineering and a D League and the Dev environment. And 0, 0, 1 as the instance. So my location is fine. Please ensure to select the one that is closest to you. The format is also fine air standard storage V2 for general is also fine. And at the retarded storage to is fine. Me select networking. And from here on the networking, I don't need to do anything. So I am going to select Next again. Now under data protection, I don't need anything here, so I am going to select advance. And once more, we don't want to this storage account to be publicly accessible. So I am going to disable, allow public access just like that. And from here, I do want a Data Lake Storage Gen2 accounts this time around. So I am going to select enabled. So what this means is that we do want to enable hierarchical namespaces since we are creating a data lake, which is also meant for big data analytics. And honestly, this is it. This is what differentiates a normal storage account to A1 designed for data lakes and data analytics. So let's select Next to go to tags. So what I want to do is to set the division first. And the division will be a data engineering. Now, I wanted to set the product owner and it will be Mr. Chair Smith. And I also want to set the data owner, and I will set that to miss a Maharaja, just like that. And once I have completed, I can simply just select, Review and create. Validation should start once again. And once the validation has passed, I will simply just select Create. Alright, so once the deployment has completed, I can choose to go to resource. And once again, from this particular page, I want to pin my storage account to my data engineering dashboard. So let me in my account and it needs select shared and select Data Engineering dev, and just select pin. All right, so it's time to create a container. In a typical data lake, containers are usually named after business units, followed by the source system. So Let's follow that pattern and select containers. And from here, I can just click Add container right here. And I am going to call the container div movement, just like that. And select Create. Alright, let's dedicate inside of the web development container right here. And from here, we can create a directory that will reflect our solar system. And once more. In fact, let me just select a directory. And I'm going to call the directory just Web Store, which is our source system, and it's saved. Now let's add one more directory which will hold raw data. Or data is pretty much data that has not been transformed in any shape or form and is 22 source. So let me navigate to Web Store. And what I want to do from here is to add a new directory and just call it roll, and it's safe just like that. And lastly, we are going to add the online sales directory where data will be actually stored. So once more, I am going to navigate to the directory. And then online sales, just like that, and hit that Save button. Alright, now that we have our storage accounts, perhaps it's time to download some tools that will help us manage our storage accounts. And lot more easier. And this is it for me and I shall catch you on the next one. Goodbye. 8. Download Storage Explorer: Hello and welcome back. Let's download and install the Storage Explorer. Microsoft provides a desktop solution that can help you manage your storage accounts a whole lot better than using the web interface. So let's see how we can download the Azure Storage Explorer. They are at least two ways to denote the Storage Explorer. And the first way is through the Azure portal. So what I can do is to select menu and then go to dashboard and then the Select Data Engineering. Just like that. Now what I can do from here, I can select at least one storage account. So let me pick this one for the Web Store. Once we on our storage account properties page, we can select Open Explorer and choose to denote the Azure Storage Explorer. So let me select Open in explorer, and I am going to select this link from this pop-up window, which should load the Storage Explorer web page. Now, since I am running on Windows, I am going to choose denote the Windows versions. But perhaps if you are running Mac OS or even lineups for that matter, you can't choose your appropriate operating system. So I am going to leave it at windows and I will hit the Download button. Alright, so now the download has completed. What I can do is to run the Storage Explorer dot EXE file. And what I can do from this window is to select the, the install mode. And I will choose installed for all users. And from here, I can choose to accept the license agreement. And I am going to choose Install. And I am happy with her the destination location. And I am going to hit Next it next again to start the installation. All right, so once the installation has completed, I can just choose to launch the Microsoft Azure Storage Explorer by hitting the Finish button. All right, so let me show you how you can add your account. So from this particular pain or your left just hit at an account. And what we can do from here is to select a subscription and also choose a CEO as the environment, and hit the Next button. And from here, you should be able to select the account that you want to use with your subscription is located, enter the password and just hit Sign-in. Okay, once you receive this message, it simply means that you have been authenticated successfully. So now you need to return back to Storage Explorer. Now, much you can do from here is two, select default directory and it should load your subscription. Okay, what I can't do is to revert back to explorer like that. And I can expand my free trial subscription just like that. And it should load all the objects within my Azure account. As you can see, I've got both my storage accounts right here, so let me just open one. And from here I can navigate to Blob containers. And sales is right here, just like that. So as you can see, I've got my files right here. I can choose to upload files, I can choose to download. I can choose to copy, clone, delete just about anything from here. Cool. Now we have completed this particular lesson. So this is it for me, and I shall catch you on the next one. Goodbye. 9. Create Your First Azure Pipeline: Hello, good people and welcome back. Let's create our first ICO, better pipeline. Now what we can do from here is to go over to menu and select the Dashboard. And now form our data engineering data dashboard. We can select our Azure Data Factory. And from this particular page, we can select author and monitored to start noting of the Azure Data Factory user interface. Alright, let's create our first pipeline. And on the left-hand panel, which is right here, we are going to find a pencil. So let's select the blue pencil to author our pipeline. Now to create a new pipeline, what I can do is to select the three dots right here and select new pipeline just like that. Now on this interface, we are presented with the activities panel, which contains transformations and some of them we are going to use to copy data from our source storage to the data lake. On the right-hand side, however, we have the Properties panel, and here we can insert our pipeline name and let's call it P L, underscore, ingest underscore, Ws, seals to data lake, just like that. So PL stands for pipeline and ws s stands for Web Store. So let's apply a description as a good practice. And description will ingest web store online sales data into the data lake. Just like that. And let me just fix the spelling right here. So Web Store is just fine. And we will not need to supply any concurrency's here. So we are just going to list is empty since we are happy with just one execution for it. So the next thing to do is to create these sales datasets. And I like to be a little organized. And therefore, I am going to create a folder that will reflect the source system, which is our Web Store. So we think datasets right here. What I'm going to do again is to select the three dots and choose to select new folder and a new folder. I am going to name it webstore. And I am going to select Create to create the folder. Now let's create the folder. So I am going to say it with store again. And if you can't see on the right, three dots appear again. So let me select that and click on new dataset. And here we can select as your Blob storage, since our source dataset comes from, well, how's your Blob Storage? And I will select Continue. And within this particular window, or perhaps right-hand pane, select the format and the source format is JSON. So I will select JSON and click Continue just like that. And here I am going to name mine dataset. The S underscore online, underscore sales and SBA naming convention that I have chosen. I want insert Jason just to reflect the file type. Now we need to select or create a new link service. So let me just select and click on New to create a new link service. So let's Name Service LS, underscore, abs underscore, and its call it Web Store. Just like that. So LS stands for link service and ABS stands for us, your Blob Storage. So let's enter a description, and the description will be connection to the Web Store. Blob storage. Just to be blunt, in terms of what's this linked service is for. Okay, The next thing that you see below is the integration runtime. The integration runtime is basically a process execution container that is allocated computing resources or perhaps infrastructure that is required to connect to your datasets. And at this stage, we have not created a custom integration runtime. So we are going to use the default integration runtime. As for the authentication method, that default account key is fine at this moment, which is basically our Azure Active Directory account credentials. We shall explore other account authentication methods that would later. So let's rather keep with the connection string and select the subscription, which is right here. And from here, I am going to pick the storage account name. And the storage account name is the which store? Just like that. Now from here, I can just choose to test the connection to see if everything, it's successfully. Great. So we got a success message. So let me select, create the soleus, get the online sales dot JSON file path. And to do that, I'm simply going to select this folder icon right here. And I will navigate inside sales. And as you can see, I've got my online sales dot JSON and I am going to select it and then click OK. Just like that. Alright, so from here, I can just choose to select. Okay. Okay, just to do one final test, we can choose to preview the data just like that. And to me, everything appears just fine. Perfect. Now, let me close the preview data window. All right, Since we have successfully created our online sales dataset, it's time to get back to our pipeline. So let me navigate to pipeline. And within this particular canvas, I need to insert a transformation component. And the translation component is located within Move and Transform to copy our data to the data lake just like that. So drag that right into the Canvas. What I wanted to initially is to set the name for our copy data transformation. And what I want to do is to name it copy online sales data. In fact, that should be web store sales data, which is just fine. Now what I want to do is to also insert a description. So it will be copy online sales data from the Web Store and ingest the to the data lake. Just like that. Alright, so if we have a look at the timeout, it seems to be set to seven days. Now, this is a very long time to wait for an activity to run a single process if you think about it. So just imagine a pipeline running full seventies. So let's change this to a feast. Ten minutes. Just like that. Cool. All right, so everything looks fine. What we need to do is to start setting our source. So let me cite the source right here. And I need to select the source dataset, which is a DAS online sales underscore, JSON, which is just fine. All right, the next thing that we need to do is to set a sink dataset. So I am going to select same. And we haven't quite defined our dataset yet. So let's create a new one. So I will select New just like that. And this time around it will be your Data Lake Storage Gen2, just like that. And I will click continue. And once I get, we are going to choose the JSON. Alright, so when you are ingesting data into the area of the data lake, it is recommended to maintain the exact same format. The concept behind this is to maintain source. If the source is a relational database, however. Then you and your team need to decide on a default format. So let's sell it, as I said, JSON. So and click Continue. So let's set the name for our dataset to be DS underscore. Underscore, online sales, underscore. Jason. Alright, And by the way, DL just stands for Data Lake. So let's create a new link service, once again to link to our Gen2 storage. And let's set a the name for our linked service to be Ls, underscore, E DLS for as your Data Lake Storage and underscore data underscore engineering and underscore DL. To complete this, it will be the Data Engineering Data Lake essentially. So as a description, it will be connection to the engineering. The data lake, just like that. So I am going to leave everything as is. Instead, I am going to select the subscription, however, which is my free trial. And I need to choose a storage account and I won't sell it or the data lake just like that. And once again, I am going to test a connection. And the connection is successful. So let me select, Create. Alright, so I'm like before all we need to do here is to just choose where we want to store the file. So I will choose the folder icon once again. So it will be under web development, web store, R4 and online sales. Perfect. And I will simply just select, okay. And once I am done, I will sign it. Okay, once again, to set our sink dataset just like that. All right, we don't need to specify anything further. What we can do now is to just eat our pipeline tool and select Delete eight. And it looks like my pipeline has been villi Peter successfully. And I am going to select Close. And let's publish our pipeline shallower. And I will select Publish right over there. And as you can see, I'm about to select at least for the teachers, the pipeline, and also the two datasets. And I will select Publish, just like that. Okay, So publishing is the act of saving our objects to our Data Factory repository. So what we can do at this moment is to run a pipeline to test to see if we can load our data into the data lake. So I am going to choose Add, Trigger. And from here. I'll just select Trigger now. So it looks like our pipeline will run just fine. So trigger pipelines now using lost publish configuration, okay, So this is fine and I'm CoV-2 select, Okay? Now let's monitor our pipeline run. So what I can do now is to navigate on the left-hand menu right here. And I'm going to select monitor. Now as you can see, our pipeline is currently in progress. So let's wait for it to finish. So it has succeeded, as we can tell right here, by this particular message. What we can do now is to select the link right here to view the details of the pipeline. And from here, we can just hover to the activity name right here. And we get this input link just here. So select it. And here we get an idea of our source. So you can read through this just to get a good sense of our input, as you can tell, from our input settings, which is our store settings right here, we get a type that comes from the US, your Blob Storage. And what else can we get here? And we'll also get a type that it's a chasten. So there's a number of things that you can get chair that could tell with the input actually comes from and also the type as well. So let me exit here by selecting Close. And once again, I can go to the output definition as well. And I'm going to get some formal information right here, like the number of rows that we read and the number of rooms that were written as well. And if we look deep enough, we can tell that the output is Azure Blob Storage. So there's a number of things and properties that you can have a look here that will make a good determination of your output. So let me close this window as well. Now what I want to do is to get the finer details of our pipeline runs. So let me select details. Cool. As you can see, we have read about that is six megabytes of data and produce the same into our data lake. Okay, to confirm this, let's go back to our storage account disorder me close this window and navigate back to the dashboard. And the storage account is from your linear finder, the containers right over there. With development, web store, role, online sales. And as you can see, we have successfully copied our online sales data from the web store into our data lake. And this concludes our lesson for our first as your data pipeline and dispositional, me and I shall catch you on the next one. Goodbye. 10. Closing Remarks - Creating Your First Pipeline: Hello and welcome back. In the last couple of lessons, we have gone onto a Tresca idea and the intention behind data integration. And then we proceeded to work on a tutorial to demonstrate how a tool like as your Data Factory can address data integration requirements. But before we close and call it a wrap, this discuss some of the advantages and disadvantages of using Azure Data Factory. The first advantage is that the demand for the Azure Data Factory skill is in high demand, and especially within the Data Engineering and analytic space. The rise of cloud-based technologies and also the need for data tools help as your Data Factory to grow very quickly. The second advantage is documentation. Documentation has greatly improved during the past few years. There are a lot of resources available through Microsoft documentation and additional information provided by the community. The third advantage is that as your data factory offers full each occasion with a CICD, or perhaps continuous integration and continuous deployment across different environments and is quite easy to set up. The fourth advantage is that as your Data Factory supports both philosophies of data integration and that is ETL and ELT. Now we have discussed ETL briefly before. The key thing to understand is that a B extraction, transformation and load up the data is primarily handled by Azure Data Factory. Elt, on the other hand, stands for Extract, Load and transfer. So in an ELT scenario, as your data factory only extract that data loaded into a data lake and then transfers the data into the target system. So it's the target system responsibility to transform the data. The loss advantage, well, at least according to me, as your data factory, consist of at least 85 connectors that allows you to extract data from. This also includes some generic connectors like HTTP and ODBC. In case you can't find built-in connectors. As your data factory is also fully integrated with services like Azure Functions, azure Databricks, and many more. All right, Now that we have a choice, some advantages. Let's take a look at some of the disadvantages. The first advantage is that when working with advanced and complex scenarios, as your data factory seems to have some limitations. So it will have to take some effort for a typical product team to address the limitations. Simply put, it's not flexible enough and it works on a predetermined pattern. I digress from that pattern, then you run into limitations. The second disadvantage is that as you're takes a factory is a web browser only experience. And therefore, if you're looking to do things like automated unit testing, then you will run into limitations, trying to either create that kind of tools with us, your Data Factory. The last disadvantage is that, well, at least according to me once more, as your data factory is one of the most difficult for planning and ongoing cost. And allocating a budget for new features have been released in the past few months, but I think it's difficult to understand for new users. All right, We can sit, share the whole day discussing and debating about advantages and disadvantages. Because at the end of the day, your experience can be different from mine. And you may later come back and disagree with some of the points I have just listed. So let me rather share my thoughts with you. What I find more important than learning a tool is the philosophy behind what a tresses a business need. The business needs a solution to store it's data into a data lake. But rather learn comprehensively on how a data lake is structured and managed. And then later on, try to find a tool that will help you structure and manage a data lake. And furthermore, learn the strategies behind the data ingestion and integration. And once more, find a tool that can help you achieve this strategy. The point to all of this is that you need to think in a tool agnostic way. Meaning the tool that you find it must fit the business and strategy and not the other way around. So therefore, the business need and strategy should not change to fit the tool head. Yes, this has happened on many occasions and has led to complex environments. There are a couple of cloud-based ETL tools, either that as your data factory that you can find in the market, such as Amazon Web Services, google Cloud, Informatica, Talend. Just make sure to find the right one for your business case. Alright, so we've reached the end of this class, and this is it for me. Good luck, stay healthy, and goodbye. 11. Introduction Metadata Driven Ingestion: Hello good people and welcome back. Let's talk apart what we aren't going to do in this new section. But before we go there, let's revisit what we have completed in the last section. So what do we have done in the last section was to develop one pipeline that ingests just one source file into d Theta leak. So what happens if we have a multiple files that need ingestion into the data lake. So let's imagine that we need to ingest 10 files into the data lake. Does this mean that we need to develop 10 Data Factory pipelines? The answer is a definite know. The justification is that it would make it a maintenance headache. Let's think about it some more. So you have 10 pipelines and then you discover that you need to change something about the way the files are being ingested. It simply means that you will need to modify and change ten pipelines. What if you have a 100 pipelines that have the same problem? I'm sure you see where this is going. So the answer to this predicament is to run a build a framework as a modern data engineering technique where you have just one pipeline that ingest the 10th datasets or even 100 and for that matter. So let's have a closer look at what we are going to do. We have at least four files that we are going to ingest. And we want to do this using just one pipeline. To achieve this would have to build a meter data driven framework to ingest the data. So what is metadata? Exactly? A simple definition would be to describe the meta data. The kind of data that gives more information about either theta. In this context, metadata would be required to be stored in a data engineering Meta data therapies and should describe the entire courses of sourcing which files from a certain system. And the files belong to a certain dataset and how they should be ingested and stored in the data lake. The orchestration pipeline will have and the responsibility to draw the ingestion information from the Meta data database and then pass it on to the ingestion pipeline. Then the ingestion pipeline will use that information and execute at least four times to deliver the files. Meaning that the information that it receives is passed through dynamically, which we shall see in the coming lessons. So before we create our styles even more with too much information, let's stop right here. And we shall address the rest of the concepts in detail in the coming lessons. This is it for me, and I shall catch you on the next one. Goodbye. 12. Create Active Directory User: Hello good people and welcome back. This create an Active Directory user, which we are going to use for creating as your artifacts for the rest of the course. I would like you to lock onto your ICO. And from here, I would like you to load as your Active Directory. And to do that, we can just say like menu and navigate to as your Active Directory. And from here we can navigate to users. All right, user that you see here below is the main superuser that are used to create my Azure account at subscription. It is usually not a good idea to use this user for managing your resources. This user just has too much privileges and it may result into unintended consequences. So let's create another Active Directory user where we can carefully manage user permissions. So for this user, I am going to use my English name, which is David. And feel free to use your item names or even nicknames for that matter. So I am going to select new user right here. And I will simply just insert my English ni DIV it right here. And under me, I am going to supply my name again, so keep it at my sunny and my name and say name again. Just like that. Cool. I do want to set an initial password, so let me create my password and I'll enter my password just like that. And I also wanted to set up a new section location. And I am located in South Africa, which is right here just for extra security. And you can do that for your own user to, and knock it to the region where you belong to. Okay, so all I need to do now, let me just take away this and on I need to do is to just hit Create, to create the user just like that. And so we've got our Active Directory user. So let's meet again in the next lesson to assign the appropriate permissions for this user. And this is it for me, and I shall catch you on the next one. Goodbye. 13. Assign Contribute Role to User: Hello, good people and welcome back. From our last lesson, we have created an Active Directory user, and we are now required to add permissions to this user. As it stands, this user cannot do anything on a 0, and we'll have to set the permissions against our Azure subscription. Okay, so let's head back home. And from here, I want to navigate to subscriptions. All right, now, I'm sure that you can see that I no longer have a free trial subscription and I'm currently using a pay-as-you-go subscription. My free trial subscription ran out a while ago. However, you should still have your free trial subscription intact, and that is what you're going to use for the rest of the course. So go ahead and select free trial at a incognito, select your subscription, which is this one. All right, in order for us to set permissions for our Active Directory user, we will need to use the access control function. Or perhaps I am. I am, by the way stands for Identity and Access Management. I guess that is self-explanatory. Let's select access control, which is right over here. All right, we've got a number of things that we can possibly do here. And to keep things simple, or we want to do is to assign permissions for our user against this subscription. And a heart on the right which reads, grant access to this resource while loop to serve our required purpose. So let's select add role assignment, which is right here. And so this is where we get to choose which permissions are we going to assign our user? The aplenty of permissions that you can choose from here. And most of them will make sense once you start taking the administrator cause. But for now, all we need is a role that is good enough for our user to create just about any Azure resource against our subscription. And that role is contributor. So let's select Contributor, which is right here. And I am going to select Next. And from here I can assign access to a user group or a service principle which is already selected. So this is fine, but I want to add a member. So let me select members right here. And on the right-hand side, I am going to choose my new user, which is David, just like that. And hit Select. And we have our user assigned a role. Now what I need to do is to select Next and review and assign. And that's it. So our role Contributor has been assigned to my new user. And if you want to check on the users, you can go to role assignments right here. And there you have it. My user has been assigned or the contributor role. Okay. This is it for me and I shall catch you on the next one. Goodbye. 14. Disable Security Defaults: Hello good people and we'll come back. Alright, there's that temptation to start using the user. However, Microsoft actually provides security defaults. Then we'll force every new user to use multi-factor authentication. Now, this is a great security feature because to log into the US your portal, you will have to use a combination of your user password and even possibly your mobile phone. Greater security fissure is it will be overkill for our exercise. So let's write a disable the security defaults. And to do that, I am going to select the menu and as your Active Directory. And from here, I can go to Properties which is right here, just like that. And on displayed, I am going to select manage security defaults. And from here, I will simply just select No. And I get a message that reads, would love to understand why you are disabling security defaults so we can make improvements. So I think we're just going to select other right here. And I will write something and hit Save, just like that. And so we have to say both the security defaults. In the next lecture, we are calling to sign in using our new user. And this will be our main user throughout the rest of the course. And this is it for me, and I shall catch you on the next one. 15. Creating the Metadata Database: Hello good people, and welcome back. So in the last couple of lessons, we have mentioned the need and the role of a metadata database. And therefore, for this lecture, we are going to create an Azure SQL database that will hold the metadata that drives the management of executing pipelines in a batch. But first, we need to sign in with our new user. So let's go to Active Directory by selecting menu and then select Azure Active Directory. All right, Now from here I want to select users. And now I can select the user that I have created recently, which is David. So let me select my user. And from here I can copy, might use a principle which is David at the following domain name, which has been given by Microsoft by default. So let me just copy that. So right-click and copy just after highlighting. Now, what I need to do next is to sign out. So from my right hand corner, right at the top, I'm just going to sign out. So select that and also select Sign Out. Okay, cool. So after signing out, I can start using another account. So let me select, use another account. And I am simply going to right-click and paste my User Principal just like that. And select Next to enter a password. So let me enter my password and select Sign In. Okay, So it has asked me to change my password. So let me enter my current password and let me insert a new password that I am going to use. As I'll explain in just like that. So I'm not going to save the password right here. So let me take away that box. And we have signed in using our new username. Okay, let's start creating a new object using our username. And let me select maybe later right here. And what I want to do to create a new resource. So I am going to select menu again, and I will select SQL databases so we can start creating our Meta data database. So from here, I can just set it to create a SQL database. Okay, Now from this plate, I will need to select a Resource Group. And this database should belong to the data engineering resource group, actually data ingestion, which is this one. And I need to insert a database name. And I am going to call it d0 and the score meter. So it just stands for Data Engineering underscore metadata, dP. Now, I also needed to create a new 17. So let me select, Create New to create a new database server. And I'm simply going to call it the underscore. Underscore server. Just like that. Now, it looks like it shouldn't contain underscore. So let me just take this underscores out. And this should be fine. All right, so now I need to insert a new admin login. So I will select or perhaps insert t0 underscore admin, like that for data engineering, admin. And I will need to supply a password so that we supply a password that I can remember. And since I am based in South Africa, so I am going to choose a location of South Africa. So let me just type in South Africa if I can, just like that. And I am going to select South Africa, North, just be sure to select the region and location that is closest to you, or perhaps where you belong. And I will now select, Okay, just like that. Now, what I need to do next is to configure my database storage because I don't want an expensive option. And to do that, I will need to select Configure database. Now, I am going to look for a page setup. So I am going to select looking for basic. Just like that. When I say basic again, which is just fine. That data make size of two gigabytes is also fine and my cost is relatively cheap. So let me select, apply. And what I want to do next. To select next because when it comes to backup storage redundancy, geo-redundant backup storage is actually just fine. So let me select Networking then. From here, I need to be able to access this from my own laptop. So I am going to select public endpoint. Now as for firewall rules, I do want to allow as your services and resources to access the server. So I am going to say yes and yes. I also wanted to add my current IP address to the firewall. So let me select Yes and select security. Okay, I don't need any of your Defender for SQL right now, so I am going to select additional settings. And from here, I do not require any additional settings. So let me select tags again. So let me start with the first tag, which is division like that. And I want to apply to for both the resources. Now we've created the database and also the database server at the same time. And so I will select for both the database and the database server. So which is just fine, just like that. So I am going to now supply the division as data engineering now. And it also insert a data owner as well. And I'm going to set that to Mr. Smith. Okay. And lastly, I need a product owner and I need to assign that to Dr. B. Java just like that. And I need now to select, Review and Create since I am done. And this all looks fine. So let me select Create to create the resource. Okay, So let's wait a couple of seconds for the resource to deploy. And then we will look to pin it to our dashboard. Okay, So it looks like our deployment has completed. So let's head to the resource itself. So let's select Go to resource. And from here, I wanted to pin the source to a dashboard. Now to do that, I am coming to the Senate, this pin right here. And I will select shed. And it's okay to it, to 2D data engineering dashboard, which is just fine. So I am going to select Pen. So we have successfully pen our resource to our dashboard. If you want to add more computers to your client IP whitelist. Well, we can actually modify the firewall settings. And to do that, I can just select Set Server Firewall. Here you can add your client IP for your other computers, you can follow the pattern below of rule name, start IP and IP. And once you have done, you can hit the save button. You can also delete any client IP from this plate as well by selecting the three dots on the client IP line. And to do that, as I said, you will select the three dots here and you will get a delete option. Now this is my own client, IP that's related to my own laptop. So I am going not to delete it. Alright, so now you know where to add your client IPs if you need. So, so let me return back to my homepage just like that. And with that in mind, this is it for me. And I shall catch you on the next one. Goodbye. 16. Install Azure Data Studio: Hello good people and welcome back. What we want to do for this lecture is to download desktop tools that we can install within our own laptops and PCs in order to work with our Cloud as your database. But before we do that, let's checkout query editor from Azure SQL, which is one other way in terms of working with therapy. So I am going to load my database again, which is the e-mail like that. And let me open Query Editor right here. And I am going to lock onto my database. Let me just write a simple query just to get the date. To do that, I am going to select Get date just like that. And I am going to select Run. Cool. So if you want to, you can work with the Query Editor. But I'm sure you can agree with me that query editor has been designed for simple tasks and perhaps some ad hoc queries. And if you need more complex setup base work, however, then you will need to install desktop tools. And luckily, the Microsoft as your team, has developed a tool called the Azure Data Studio. So let's download as your Data Studio. And to do that, let's go over to overview and select, Okay, here for unsaved teachers, which is fine. And from here, I want to select connects with, which is right here on top. And select as your Data Studio, just like that. Now what I want to do next from here is to download as your Data Studio. And from this particular page, I can find the right installer for my own platform. And my platform is actually windows. So I am going to select the user installer for Windows. And if you happen to be using Mac OS or Linux, please make sure that you don't know the right version. So I am going to select to use the installer for my Windows platform. All right, so once the download has completed, we can start installing Azure Data Studio. So I've double-clicked to kickoff the installer. Now from this window, I can simply select to accept the license agreement. So I am coming to the now select Next to continue. Here you can choose where to install Azure Data Studio. And I am fine. What are the defaults? So I will select Next to continue Next again. And I want to register as your Data Studio as an editor for supported file types. And I want to create a desktop icon. So let me select Next and pick up the Install button to start the installation. Okay, cool. So now that the installation has completed, I am going to leave this check box, checked like that because I want to launch as your Data Studio. So let me select Finish to launch the Azure Data Studio. Now from here, what I want to do is to create a new connection. So let me just take only those messages, they came from the bottom. Now, let me sign up for the server icon to create a new connection. And from here, let me select Add Connection. Okay, now I need to pick up my server details to enter within this particular blade. And first, I am going to need a server name. So what I can do is to return back to my SQL Database, which is this one. And I need to start peaking of the information of 78, which is right here. So I am going to copy that. And now I will return back to us your Data Studio. And I click and just paste here. Now I need to submit my lock on credentials, which is my admin account. And I need to start by setting the authentication type, which is SQL login. So it'll be underscore admin. And let me insert my password. Let me select, remember my password and from here I can select a database. And it's now loading, which is a DE met, just like that. And since I am done, I am going to select Connect. And there you have it. We have successfully connected to our Cloud-based metadata database using our, US, your data tools. So let's return to the next lecture to add the Meta data tables. And we're going to use these two just to do exactly that. This is it for me and I shall catch you on the next one. Goodbye. 17. Creating Metadata Tables and Stored Procs: Hello, good people, and we'll come back. So now that we have created the metadata database from our last lesson, in this lesson, we are going to look at creating our metadata tables. I would like you to download the attached a zip file called metadata DB underscore SQL scripts dot zip. Now that you have completed downloading your zip file, what we can't do is to extract the file. So right-click and extract all. Just like that. I am going to keep this folder and I'll select, Extract and you have it. And inside of the metadata because folder, we also find another folder called SQL scripts. And this is where all your SQL scripts that we are going to use to create the metadata database. All right, so please open your Azure Data Studio and make sure that you are locked into your database. So let me do just that. And let me select my account again. In fact, let me add it again. And let me sign in. And my account was added successfully to Azure Data Studio. So now let me return back here and I can add my current IP, which is that one, and select. Okay, cool. Now that we are connected, let's open an SQL file, which is going to create the metadata tables. So I am going to select File and then Open File. And I will navigate to download my metadata SQL scripts folder. Again within the SQL scripts folder. And inside, I am looking to find create Meta data tables, which is that one. And I'm going to select Open from here. Only need to do is just run the scripts. And just like that, we have the tables. In fact, to confirm what we can do is to go through to these tables folder idea. And as you can see, all the tables have been created. All right, so what we want to do next is to create the source to target view. So let's do the same thing. So File and Open File. And let's grab a creative use dot SQL and select Open. And just like before, I'm simply just going to select one. And I can refresh the views folder right here. And as you can see, I've got my source to target view. And by the way, to refresh is a simply right-click and select refresh and it should load your view. So what do we need to do next is to create our stored procedures, which is effectively our ingestion engine. So let's do File and Open file again. Let's find stored procedures. So creates dot talks right here and select Open. And just like before, I just want to run the stored procedures just like that. And we have our engines, which is our stored procedures, created successfully and to confirm that we can go under programmability folder right here. And we should find dot-product. As you can see, they are contained within the stored procedures folder. Okay, now that we have our table structures in place, it's time to insert initial data into the metadata tables. So once again, when I want to do is to do File and then Open File again. And select, insert metadata, which is right here. And select Open. And to run the file, I just want to click Run just like that. So we have populated the initial metadata tables. But what we want to do is to start populating the e-mail addresses. Now for the military says things will tend to be a little bit different. Now what I want to do is to do File Open and open file. Let me open the insert e-mail scripts right over there and select Open. So let's take a look at the script. As you can see. Before you can run the script, you need to change the values of some name one and some lastName one again. And also at gmail.com, basically just making up of the FirstName, LastName, and e-mail address. And you also need to replace some name2 and some LastName too as well, and also full 20 at gmail.com. So basically we just need to pretend that there are more people in the organization that needs to receive emails. And so once again, you would be required to supply perhaps at least two of your other personal emails. All right, so to do this, I can now go to edit and do a replace. And I will start looking for some name one. And I'm just going to replaced it with my firstName to Lani and do a replace all k. So let me find some name like that. And I'm going to replace it with my English name David, just like that. And I also want to find some last name one. And I'm going to replace that with my Sidney just like that. And just replace all again. And I wanted to find some last name too. And I'll also replace what the Bugatti and do a replace it just like that. Okay, Now I need to work on the e-mails. And what I want to do, I will just simply copy, paste it there. And let me insert one of my emails so to Lani and GD at Outlook.com and replace that. And I'll also wanted to replace foo2 at gmail.com with one of my emails traces, which is Dafu dot godaddy at Outlook.com. Look, it doesn't have to be a Microsoft related. It could be any one of your choices, so which is just fine. So let me just repeat that. Okay, cool. So let me close the search and replace box right there, and let me just double-check if everything looks fine. Okay, so everything looks fine. What I want to do is to just run the script like that. And what I want to do is to now check if I have populated the military successfully. So I can go to e-mail recipients right here, right-click and do a select top 10000, just like that should work. And yes it is. So it looks like I have got my email latrines successfully. Alright, so we have finished creating our Meta data database by populating the tables, views, and stored procedures, and also supplied initial data as well. So this is it for me and I shall catch you on the next one. Goodbye. 18. Reconfigure Existing Data Factory Artifacts: Hello, good people, and we'll come back for this lesson. We will look to reconfigure our existing Data Factory and apply new business naming conventions. We have applied to the business naming convention before. However, we will more relaxed in the naming off of the objects. It's time to reconfigure them and ensure we apply what you will expect in business. So let's open our Data Factory. And to do that, we can't select menu, select Dashboard, and make sure that you select the Data Engineering deaf dashboard. And it's open at the Data Factory right here. And from this window, we can simply select Open as your data factory studio, just like that. All right, As you can probably see on the data fetches studio, I am still authenticated with my previous email address. What I can't do is to sign out and sign back in again with my new user that I have created recently. You experience may vary. In fact, you may be asked to authenticate and select a subscription with your previous user and then choose to sign in the after. So let me just sign out from my previous e-mail address or perhaps users. So the nice I like that. Choose a sign out. Okay, cool. So I'm not going to sign out just yet. What I can do is to close my entire browser. Alright? Once you have close your entire browser, what you can't do is to sign back in again to us your portal, but this time around using our new account. So let me do just that. And from here, I can't supply my password like that and choose sine n. Okay? Yes, I wanted to stay signed in. And then let me return to the portal now that I have signed in. And from here, I can load the data factory again, and I shouldn't experience the same problem. So let me open as your data factory studio. And as we can confirm, we have locked in using my user that I have created just recently. All right, so let's go to Link citizens as our first. So let me just go to Manage and select link services right here. Okay, one thing to know is that we can't rename link services. We would have to create a new link services and then later drop them as part of a cleanup process. All right, so what I want to do is to rather create a new linked service. So let me select New. And the first service that I'm interested in is located within the Azure blob storage. Well, for the Azure Blob Storage and let me select continue. All right, so this needs to link to my storage account. We see online data is actually stored. And this time around, I wanted to name it Ls underscore a, b ls. And it will be by a lot. Underscore. What sport. Just like that. And let me enter my connection details. So as part of the description, so connection to the buy a lot. Website data, just like that. And just like before, I'm going to select my subscription, the storage account, which is st web stores that are 0. And let me test the connection and it appears to be successful. So I will select Create rather. Alright, so LSAT, ls is just fine for data engineering, so I am going to leave it like that. So by the way, a BLS just stands for us, your Blob Storage. So what I want to do next is to rename the datasets. So let me return back to authoring of the Data Factory resources. And here I can find a dataset. And I am interested initially whether the online sales JSon data and select Properties button right here. And from here I can start changing the names. So I want to call it a DS underscore, a BLS. Underscore by a lot. Okay, that should be an under war just like that. Cool, that looks just fine. Now, I also wanted to rename this particular dataset as well. But before I do that, I wanted to set the right link service, which is fellas ADLS by a lot. Perfect. Now to check if things still work as they should, let me do a quick preview. Okay, so this is working just fine. Now, what I want to do next is to also change as the data engineering dataset. So which is this one for online sales. And then you go to Properties again. And what I want to do here is to change it to ds a, D, S, underscore, data engineering, underscore JSON. Just like, perfect. Okay, so what I want to do from here now is to verify if everything's still find. So click validate all and click. Okay. Now I wanted to publish my new changes and let me select, Publish once more. Okay, now what I want to do is to now remove the old link service. So let me go back to manage again and make sure to select linked services. Okay, So let me go over to the web link service and let me just hit Delete just like that and confirm the deletion. Okay, My Linked Service has been deleted and this looks just fine. All right, so we're going to rename some of the pipelines at the later stage. So far we are done with the reconfiguration exercise. So this is a show me and I shall catch you on the next one. Goodbye. 19. Set Up Logic Apps for Email Notification: Hello, good people and we'll come back. All right, we are going to need a notifications strategy. And especially when a pipeline fields. And it'll make sense for the pipeline to send an e-mail notification. So how about we set up a Logic App for e-mail notifications. What we can't do is to select menu and select all resources. And from here create a new resource. And that's fine. Integration. And logic app should be right here on top as you can see. And select Create. Okay, what we can do here is to now assign a resource group. And let's select the data engineering slash data ingestion resource group. And let's keep the Logic App a name. And let's call it f dash, dash, data ingestion, dash, email. Just like that. And my region is set correctly, which is north. All right, We will not be needing to link or this logic App2 at integration servers, such as the one that is used to run our data factory jobs. And also, we will not need to enable log analytics as that can be a costly exercise. So let's sign it next to insert our tags. And I wanted to sit a division of data engineering. I also wanted to sit a data owner and a data owner. Mr. Smith, I also wanted to sit a product owner and it should be equal to ux and next to Review and Create. Okay, so once the vegetation exercise has completed, our character, simply just select Create. Once the resource has completed, what I want to do is to now go to resource. And as you can see, the designer has opened successfully. So let's return back at least one more page, because what I want to do is to add that to my dashboard. So let me select shared. And I want to pin 2D data engineering dashboard and then select. Okay, now I want to go back to the designer. So a nice, solid Open Designer. Alright, so the whole idea is that a pipeline can send an HTTP request to request the app to send an e-mail notification. So let's say late when a HTTP request is received, which is right here. So when the pipeline needs to send a message, it usually sends an HTTP post request. Think of the post request as a way to deliver a message. The opposite of post is the get request. And think of the kit request as retrieving a message. Okay? The request body needs a template or perhaps a set of rules that the sending pipeline must adhere to in order for the request to be processed. So in this case, this template or perhaps a contract by generating a schema. So I am going to select sample payload to genetics a schema. And let me take that. And from here, I am going to start creating my template in JSON format and my template, or perhaps contract if you like, it needs to get a subject. And now let me just add an additional comment here to describe what subject is k, the value of my subject property. So subject of the email message. Just like that. And when I want to do next is to add another property which he's message body. Hey, and also wanted to insert a comment which is body of the email message. I want my template to have at least one more property. And that property is email. A choice. And when I want to do now is to also insert a comment right here. And I just want to say just e-mail address. And once I am done with my template. It looks okay. And I will select done just like that. And the sample request body chastened schema has been generated automatically. Okay, remember to include a Content-Type header. Don't worry about this. We will do that when we send our actual message through the pipeline. So let me select, Okay, Got it. All right, so the template has been generated. Let's select the next step, which is to send the actual e-mail so that besides next step. All right, so what I want to do here is to search for Gmail. Just like that. And I am going to choose the Send Email part version to possess what I am looking for. All right, You may be wondering why we have Gmail. Gmail will be our email server that will be sending the actual messages. Unfortunately, we don't have an email administrator that has created a more business focused email server. So this rather uses something we can get for free. So what I want to do now is to insert the connection name and which data engineering? Gmail. And now I wanted to sign into my Gmail account. So if you don't have a Gmail account, please do actually create one. All right, so let me select my Gmail account. And I will proceed to now allow Azure App Service Logic Apps to access my Gmail account. Well, not exactly access, but at least use it as an email server. So let me click allowed to do just that. All right, so we now need to add pedometers that Gmail actually requires in order for it to send an email. The template that we have put together, we will use it here to supply what Gmail requires. So let's start with the two fields. So let me select the two fluid right here. And from here, I am going to select CMO and I will search for email address, CPU. Now, I want to return back to this side because I want to add a new parameter. So let me select, add new parameter. And the next parameter that I am interested in is the subject. Okay? Now, once I've got the subject, I wanted to get the subject from the template, just like that. And I want to add one more parameter, which is the body. Okay? And from here, I am once again going to select, well see more so I can get the body. Perfect. Now I am just going to hide that because we are actually done. Actually this is incorrect. I don't want to be body here. I want the message board. Perfect. Apologies about that. Now, all I can do next now since we have completed this exercise is to just save. And what I want to do next is to now get to the URL that I can call to send my post request. So let me select that back when HTTP request. And I am going to get this HTTP post URL, so I am going to copy it and I will save it somewhere for later use. And we are done. So this is it for me and I shall see you on the next one. Goodbye. 20. Modify the Data Factory Pipeline to Send Email Notification: Hello, good people and welcome back. In the last lesson, we have setup a Logic App to set an actual e-mail notification. So how about we developed a pipeline that can send an email notification using the Logic App. So I would like you to open your Data Factory right here and select Open as your data factory studio. So let's navigate to author, and let's load our pipeline right here. And when I want to do next is to, well, I want to start by renaming my pipeline. So I want to rename it to P0 underscore data ingestion, Jason. Just like that. And let me publish that and select Publish. Now, I want to create a clone of my pipeline so could have a little history. So let me select the three dots and select Close. And within the clone, I am going to add right at the end of v2 at my pipeline name, just like that. And what I want to do next is to find an activity that has the capability to call an HTTP URL, either through a post or perhaps a getMethod. So after the copy activity has finished its processing, the next activity that should follow is an e-mail notification. So under the general activities, we can find the web activity. So under general, Let's get the web activity, which is somewhere here, attracted into the canvas. And then this link of the two activities, just like that. Now I want to rename of the activity to send email and set the timeout value to 10 minutes. Let me add the description. Send email, notification. Just like that. In fact, that's not ten minutes. Now, this is ten minutes. And I want to go to now settings. And this is where I needed to paste in my URL. So let me go back to my app logic, which is right here. And let me be my HTTP post URL. So let me copy that again. And let me go back to my Data Factory and let me paste it here. Okay, just like that. And it sets the method to post like we have described before. And the first thing I need to do next of the hidden information. Now, I needed to tell of a Logic App that I am sending information in JSON format. So I need to set the content type. So content dash, type, and application for slash, JSon, just like that. So once again, the header information is basically telling the Logic App that we are going to send information in JSON format. So let's insert the body information and we are going to match the JSON attributes that come from the template that we have built in the previous lesson is this particular template that we have put together that contains the email address, the message body, and also the subject. So let me return back to my Data Factory. And this time around, I am going to insert Jason into this particular body textbox here. So let me select added dynamic content, which is way much better to use. So the Azure Data Factory has a built-in API, as you can probably see, that allows you to do various type of data transformations and manipulations. So you can see here, I mean, you can get some functions that return certain types of system verticals, such as an ID for the pipeline. The pipeline trigger time. And we've got other functions right here that can allow us to do some form of data manipulation. Okay? So as time goes by throughout the course, we are going to unpack more of these functions. So the first thing that we're going to need is a function that can convert a string to a JSON. So let's search for and right here. And, and here we are conversion functions. And let's select it. What I need to do next. Now supply a concat, function that will allow me to construct and stitch a JSON string together. So this gets the concat function. I want to clear this. And under functions, Let's look at string. And as you can see, we've got contact here. And let me first insert my cursor right in between the brackets. And let's select CONCAT. So let's insert of the first string and it needs to be within copes. So let me insert my double-quotes. In fact, the single quotes and double quotes, I am coming to supply a curly bracket like that just to signal the beginning of a JSON string. So let me press Enter to move to the second line. Now as part of the initial string, I wanted to insert an immunologist and the first immunotherapies. So it's double dot godaddy at Outlook.com. Just like that. Now, what I want to supply now is a comma and I also need a subject as well, subject colon. And then we enter the value after the subject, which I am going to unlit ADF pipeline. So then we open and close brackets data ingestion, JSON, just like that. And I am going to need a message case or comma, followed by message body like that. And then column. And ingestion of web store data into the data lake has completed successfully. And colon right there. Now, I will need a closing curly bracket, since we are constructing a JSON string, which is right here. And that's it. All right, So let me double-check. Everything looks fine and I guess that looks fine to me. So let me hit Finish since I am done with SAT. And now we can validate the pipeline before we start running it. And to validate the pipeline, I am going to select, validate all. And everything seems fine. Let me hit the close button. All right, now let's test the the pipeline and see if it does actually send an e-mail. So let's run debug and linear head over to output to start seeing the run. And as you can see, the run started, what are the first copied activity? And it's still in Q. So let's wait for a couple of seconds. Cool. It looks like it has succeeded. Well, I guess the next step is to start checking your email box to see if an email notification has been sent. So I am going to check my just now. All right. It looks like my email has come through as intended. But the problem is it came through with no subject at all. So obviously there's something that we need to fix. So let me had a batter to my send email activity and now I need to get back to the body. So let's see what's happened there. And immediately I do notice a problem. And that should be subject without a column. So let me take away the column and I think it should fix the problem. So let me select Finish again, validate one more time, and do a debug. So let's wait for it again to run and finish. Alright, so it looks like it has succeeded again. So I'm going to check my e-mail again. Cool. And it looks like it has put together an email that also contains a subject. Cool. So now this looks just right. Okay, so we have completed this exercise and I will catch you on the next one. Goodbye. 21. Create Linked Service for Metadata Database and Email Dataset: Hello and welcome back. In the last couple of lessons, we have seen how we have supplied in number of email addresses to the e-mail recipients database table. And in the last lesson, we have hard-coded the actual militaries. And therefore, we need to start using the e-mail addresses that come from the e-mail recipients database table. So let's get down to business. But before we go ahead, please make sure that you have published version to offer the ingestion pipeline. So first and foremost, we are going to need a linked service that creates a connection directly to the meter data database. So let's select, manage. And under linked services, select New and from the pop-up window on the side. Let's select SQL and click Continue. Okay, so I wanted to name my linked service unless underscore a SQL underscore data, underscore data engineer. Underscore mater data. Just like that. And I want to insert a description connection to the met, a data, a database CPU. From here I wanted to select the subscription and the server name, which is the weld, metal server, and the database, which is the mater. And you need to supply the authentication type, which is SQL. And the user admin, which is the underscore admin and also the password. So let me just test the connection. And it looks to be successful. And I am going to select Create. Now that we have a linked service, we will have to create a new SQL dataset that will hold the e-mail account. So let's return back a 2D Data Factory resources or we may select author. And under dataset, I wanted to create a new folder. So let me select these three dotted lines and select New Folder and insert mater data. So this seems fine. Let me select Create. Okay, cool. Now what I want is to create an actual dataset now, so let me select the three dots, again, the metadata, and select new dataset. And from this pop-up window on my right, I am going to select Azure SQL and click Continue. From here, I will select my new link service and also select the table name, which I am interested in, and it will be e-mail recipients. So from here, I can also name my dataset DS, underscore, a SQL underscore, e-mail addresses, and select Okay, to create the dataset. And let's give it a test. So let me test by selecting preview and you have it. We get the email accounts. Cool. Now this looks good. Don't worry about the repeated game electricity at this stage. We are going to fix that in the next coming lessons for now. So let's publish and hit the Publish button to make the changes permanent. Okay, this is it for me. And I shall catch you on the next one. Goodbye. 22. Create Utility Pipeline to Send Email to Multiple Recipients: Hello good people and welcome back. For this lecture, we will look to create a pipeline that is designed to manage email notifications. To pipeline will make use of the dataset that we created from our previous lesson. The idea behind utility pipelines is that it's the type of pipeline it is designed to be reused by other pipelines over and over again. It takes away the building of a complex pipeline B6 to do everything. But instead, it can use utility pipelines that take away that complexity while delivering the desired results. So let's create the utility pipeline when I want to do next. So within pipelines is to create a new folder called utility. So the pipeline I will select the three dots, select New Folder, and insert utility, and select Create. And now I wanted to create a new pipeline. So once again, with a utility, select the green dots and create a new pipeline. And let's give it a name, Let's call it PL. Actually, heal underscore You TO underscore, send email notifications just like that, and insert the description. So this is a utility pipeline that sends email notifications. All right, so let me fix that spelling right there. Okay, cool. What we need to do next is to insert a pipeline parameters. It will accept a system code and a dataset name. This is because we have a stored procedure called get e-mail addresses, which take in a system code at a dataset name. We use the stored procedure can email addresses to get a list of e-mail addresses that are associated with a specific system code and a dataset name. By adding parameters to the pipeline, we are effectively making the pipeline dynamic and reusable by other processes. And a pipeline that caused this huge TTT pipeline. We'll have to supply the parameter values. So let's insert the parameters. Now, I am going to select these empty canvas. And on the part of eta's, I am going to select New. And I'll need to supply a system code. And it'll be of type string. And let me set a default value of o ws as the system code. And also another parameter called data set. Name. And the pixel type will be a string and a default value of sales, just like that. So after adding parameters, we now need to find an activity that will read in the list of emails from B, get e-mail addresses stored procedure. And the activity must have the ability to supply a system code and a dataset name to the store procedure in order for it to return the correct electricity. And that activity is known as the lookup activity, which in principle means looking up something. And we are looking up of the email address. So let's add in the lookup activity. And I can find a cup activity under general right here. And let's drop it right there. Okay. Now, I wanted to give my lookup activity and age, so get e-mail addresses. And let me insert a discussion. Gets the email recipients for a spin and fake system code and dataset name. Okay, So let me set the timeout to 10 minutes AND, and NOT seven days, which is just perfect. Okay, so let's select settings. And under settings I want to select the source dataset. So the silent bs a SQL immuno choices. Just like that. Now, as you can tell, the email addresses dataset is associated with the inner recipient's table, which returns all the e-mail addresses. Without any form of filtration. It is therefore a good idea to override the default functionality and rather supply a stored procedure which accepts a system code and a dataset name. And let's be stored procedure filter the records. So let's select the stored procedure, get e-mail addresses. So I would have to select stored procedure under US query. And let me get the correct stored procedure, which is good choices. And now to show the parameters, I would have to click Import parameters just like that. Now we need to supply the values of the parameters that will go into the stored procedure. And stored procedure will fetch the values according to the values that have been supplied. So let's apply the values and we are going to use the geo DataFrame API 1 second. So let me select the value and kick at dynamic content. And since this is the dataset name, I am interested in the parameters data set name, and click Finish once more for the system code at dynamic content and supply the system code as you can see right over here. Okay, So let's finish what I probably as you can tell, since we've got default values assigned to the pedometers, this is where the data actually come from that will be passed through to the stored procedure. Similar because there's no calling pipelines that are parsing the actual values. So now we need to ensure that the values that are returned by the stored procedure does not return only one record. So I would like you to scroll down and find the first row only and unselects that. Okay, so let's do a data preview to see what's going on right here. So let me select Preview data and the default values from the parameters have came through automatically. And let's select, okay, because since we've got system code and the dataset name supplied, so let's see what happens. And as you can see, we got back the results associated with the supply system code and data set name. So as you can see, the filtration have already supplied. Previously. We got at least four records from the electricity's table, but now we've got just two records. So what do we need to do next is to read each email and then call the Logic App that sends the actual e-mail notification. So we are going to need a for each activity that we'll do. Also same for each e-mail. So let's add the for each activity. So let me takeaway review data and iteration and conditionals. We should find the for each activity. And immediately what I want to do is to link this lookup activity to this for each activity. Okay, So let me go back to the for each activity since I wanted to supply a name. So for each e-mail address and the description, send a message to each email address. Perfect, So now I want to select Settings. Now, even though we have connected the two activities, we still need to tell a V for each activity that data items that it needs to fit. The items in this case are the two records from the look up activity output. So let's supply the items property. So from here, we'll find the items property and add dynamic content. And the outputs is right here. So let's select and from the get e-mail addresses output activity, I want to get all the values. So dot value, just like that. And select Finish. All right, now we need to add activities for the, for each activity to process. And it needs to send an email. So let me edit activities right here. And now I want to go back to my previous pipeline, which is PL data ingestion version two. And all I want is to copy this activity. So right-click copy. And let me get back to my send email notifications you teaching pipeline. Just paste the activity right here. So right-click and Paste. Now for every iteration from the look up activity items, there will be an associated unique email address. And we need to supply this email address dynamically to the JSON template. So let's head back to settings and modify the body on-chip uses so sittings. And let's modify the body attribute right here. And all we need to do here is to take away the email address, which was hardcoded. Now, what I want to do is to insert a single quotes and then insert a comma, and then load item like that. And I wanted to get the email addresses value, which comes from the stored procedure output, by the way. And I will need a comma like that. For clarity limiters insert a space over there, and that should do it. Now this syntax may look a little weird at first glance, but remember, we are using the concat function to build a complete string and thereafter conflict extreme Jacelyn, in order for the API functions and execute within a string and also return a value. They need to be enclosed within single quotes followed by a comma. So let's finish. So let's return back to the converse. And now it's time to test the pipeline. But first I wanted to do a very date, which looks fine. And I want to now run to see if I can send an e-mail notification to both my e-mail addresses. So let me select, Okay, and go back to the canvas and then select output. And let's wait for the process to start kicking off. Okay, so it looks like we've got an error. So let's investigate. If I look from this Send Email activity, basically the first one from the first iteration. And if I look at the error, it says here, alright, so this cannot be evaluated because poverty email addresses doesn't exist. So let's fix that. So if you look at that e-mail addresses and we run a quick preview, this is the value that we are supposed to pick, which is e-mail a choice. So the actual column name. And I got that wrong. So it should have been just email address. So let's return back to for each, modify that and get back to send email. And under Settings go to the body. So let me take away those two letters should be just e-mail address and select Finish. So let me go back again. So let's run this. Let's see what happens. Cool. It looks like it has completed successfully. Now, your job is to check both your emails for the notifications. And since we have completed, let's do a publish on and select Publish just like that. All right, so we've tested the utility pipeline with default values. Let's see if we can call it a utility pipeline from the data ingestion pipeline in the next lesson, does it for me and I shall catch you on the next one. Goodbye. 23. Explaining the Email Recipients Table: Hello, good people, and we'll come back. In this lesson. I am going to explain how the e-mail addresses mater actually works. So please note your audio Data Studio. And once you find yourself within Azure Data Studio, I would like you to query the table e-mail recipients. So what we can do here is to right-click and select top 100 to start querying the data. All right, So let's take a look at the data columns from the e-mail recipients table. We have a system code and also a dataset name. What did these two fields actually tells us is that the e-mail addresses that are supplied here up based according to which recipients are interested at being notified for a specific system and a dataset. So you can basically add a user here to this table with the information for which system and dataset they look to be notified for. Let's now observe for the column system info ID, a system in for ID is actually a foreign key that links to the system info table. So therefore, that table is actually a child of the parent table system info. So this query, the parent table system info. And to do that right below the query that we have just ran, Let's insert go just to separate the two SQL statements. And let's do a select all from the schema DIYbio system info. And it's one that and observe the results. So here we can observe the systems added. They're related datasets. So when you are onboarding your new system, this way you actually start by providing the initial system details and the dataset. If the same system has two or more datasets, you can just insert another row here to indicate a new dataset, as you can probably see here, with the buyer naught on my Web Store, which is represented twice with two different datasets, which are sales and sales reference. So one thing to point out here, you can disable a dataset that corresponds to a specific system by setting the status to disabled. Okay, let's work out the relationship between the system info table, add the e-mail recipients table. We can basically do that by inserting go again right below this SQL statement. And now what we want to do is to do a select from DPO. But e-mail recipients. And I want to add the alias of a just like that. And from e-mail recipients, I wanted to get the firstname. So a dot firstName. And I also wanted to get the last names of a dot last name. And lastly, I also wanted to get the electricity just like that. Okay, Now I wanted to join the system info table and it will be an inner join. So the inner join, DIYbio, the schema, that system info. And I want it to have an alias of B. And I also wanted to get all the columns that come from system info. So it'll be v dot star to get all the columns just like that. And I wanted to join on p dot ID equal to 8 dot system info ID. And let's insert go. Just like that. Now I wanted to run all these three statements on at once. So let me select that and observe the results a little bit here. What we can see here is a relationship between the system info table, add the e-mail recipients table. So by joining the system info ID from e-mail recipients and the ID from system info table. We can tell which e-mail recipient is designated to which system and the associated dataset. Alright, now that we have an understanding of the e-mail recipients table, we can look forward at using the table in the next coming lessons. This is it for me and a shall catch you on Linux one, goodbye. 24. Explaining the Get Email Addresses Stored Procedure: Hello good people and welcome back. In the last couple of lessons, we'll have pulled on the spot procedure, get e-mail addresses to get all the e-mail recipients that are related to a specific system and a dataset name. Perhaps the stored procedure could have been already named good e-mail recipients. But hey, get e-mail addresses is just fine too. So let's observe what's going on inside of the stored procedure, and therefore, please load your ASU Data Studio. All right, so let's open the stored procedure. So it will be located under this folder. Programmability and stored procedures as an inner folder. And it's fine, get e-mail addresses, which is right here. And from here what we can do is to generate a V, create a stored procedure statement. So right-click script as. All right, now let's observe of the code. So what is a stored procedure? Exactly. A stored procedure is a batch of statements that are grouped together to work as a logical unit and stored in the database. If you don't understand what these concepts are, I strongly suggest that you take up a cause or even read a book about SQL Server and SQL. And it should contain chapters concerning on how to develop stored procedures. All right, as you can probably see that this is a simple stored procedure that US hips two parameters. The parameters are system code and data set name. The next step is for the stored procedure to run a SQL query that returns the list of e-mail recipients. The application of the where clause of system code and data set name and shores data. The e-mail recipient list is limited to only this applied system and their Sydney and it doesn't return everything. All right, this was a short lesson, but however, it does inform you that behind the scenes, the so-called engine is based on stored procedures that put everything together. So this is actually me and I shall catch you on the next one. Could buy. 25. Modify Pipeline to Send Email using the Utility Pipeline: Hello, good people and welcome back. All right, In the last couple of lessons, we have managed to create a utility pipeline. So let's clone our pipeline and then add the activity to call the e-mail utility. So to clone, I agree to select my second version of my ingestion pipeline and then select the three dots and select Clone. And I want to rename this pipeline to v3 43. Okay? What we want to do next is to sit pipeline parameters. In fact, since this pipeline will gradually change towards its big dynamic and also cater for multiple JSON sources. They're just sales. We will gradually add parameters to make it more dynamic. So let's add the parameters. And the parameters that I'm looking for is system code. And also the dataset name. Just like that. And I am going to supply default of 0 Ws and also sales. Perfect. All right, let's remove the web activity and replace it with the e-mail utility pipeline. And to do that, we will need to add in the execute pipeline activity. So let's start by PET K, so silent and just hit the delete button. And now let's add the execute pipeline activity. I think we can actually find it under general. And there you go. Let's track in execute pipeline. And what we can do next is to just attach the two activities together just like that. Now what I want to do is to sit the pipeline name, to send email. No T for vacation. And I wanted to go to settings. And I want to select the invoke pipeline, which will be PLU TO send email notifications. All right, so now we need to supply the values of the parameters. And to do that, I want to select value right here, added dynamic content. And I need to supply a system code. Select Finish, and also needed to supply a dataset name just like that. So these are values that are coming from my main pipeline and then sending them through to the send email notifications utility pipeline. Alright, so now that we have attached the execute pipeline activity and also invoking the utility to send emails. What we can do next is to check for the pipeline. So what I want to do first is to validate, to see everything is fine and everything looks fine. So let me hit the close button. And what I can do now is to just run the pipeline and supply if you have to, the system code and dataset name, since it came through by default, I am just going to select, Okay. And let me select output to observe the pipeline run. And as you can see, it has already started. Okay, the first copy activity has completed, and now we are waiting for the center notification to complete. And it has completed successfully. Alright, so please do check your emails to see if you have received any form of notifications. Okay. I have received the notifications. So what I can do now is to just publish the pipeline. And just like that, our pipeline doesn't have to contain the logic to send an e-mail notification. Another ingestion pipeline, or a pipeline that does something else can use are the very same e-mail notification utility pipeline. We're applicable. And this is it for me and I shall catch you on the next one. Goodbye. 26. Track Pipeline Triggered Run: Hello good people and welcome back. One of the major requirements of any data engineering solution that it must have the ability to keep track of its proceeds. We need to be able to keep progress and generate business intelligence reports that inform the business of its activities. The metadata database does have provisions to keep track of every pipeline run. And we are going to use a dose provisions to monitor the pipeline run that has been triggered. So the first thing that I'm going to do is to clone my pipeline. So once more, let's select the three dots and select Clone. And from here, I am going to set version four just like that. Okay, so the next thing that we need to do is to add some pipeline parameters. And we are going to need a RUN IT that kick tracks of every batch run. So each batch one that caused the multiple pipelines is associated with a run ID as a key identifier, and is also associated with a snapshot date. A snapshot date represents a picture of a system at a given date. So let's add the two parameters. And the first one is run ID, and the second one is snapshotted data. And let me supply, okay, before doing that, let me fix that. And then you supply the default value of R1 ID. And also a snapshot date, which I'm going to give the date of 2021, let's say of seven. See the 11th? That's fine. What we need to do next is to call a stored procedure that is entirely dedicated to keeping track of every pipeline, which is called insert pipeline log. So let's add the store procedure and we can find this under the general activities and less drag the stored procedure into the Canvas. Okay, now what I want to do next is to rearrange the connections. Because just after executing the copy activity, I wanted to log the pipeline progress and thereafter sent an e-mail notification. So let me take away this connection because we don't need it there. And let me check this to the left and this activity right in the middle. And let me create a connection to the stored procedure. And they after, it should connect twos, the scent emulsification pipeline. Now, what I want to do next now is to supply a name for my stored procedure. So let me select stored procedure and insert log pipeline success. And let's set the timeout to just 10 minutes. Just like that. Now, I need to go over to settings so I can pick up my stored procedure name. So let me select Link Service right here, which is the Data Engineering meta-data Linked Service. And let me pick the stored procedure which is insert pipeline log. Now I need to get to the pedometers, which is right here. All right. As you can probably see, there is quite a number of parameters that are required to keep track of every triggered pipeline. Unfortunately, though, Data Factory seems to list the parameters in alphabetical order, and ordering does not make sense, as you can probably see. And unfortunately, there's no way to reorder the parameters. So we will have to live with a situation. So let's apply the parameters. So let's start with the end date and we need to provide a timestamp right here. So let me select the end date, the value of the end date, and click add dynamic content so we can start adding the value. Now for n time, we need to get a data that resembles your specific time zone. And mine happens to be the SWOT African Standard Time. If you want to check for your support a time zone, please visit the Microsoft documentation as you probably see your screens on in this hole up. And this is the page. And if you want to find the time so you can scroll down and here you can find a list of available time zones. Now for me, all I needed to do was to just search for my time zone, which is the South Africans her time. In fact, I just searched for South Africa. And this was the result. So let me copy this time zone because I am going to use it. So after highlighting my time zone, right-click and select copy. Now what I want to do is to return to my Azure Data Factory. Now, I need two functions to make this happen. The first function, it needs to have the ability to convert a Standard Time Zone into my South African time zone. And at the Standard Time Zone for Azula is actually UTC, which stands for Universal Time Zone, or perhaps rather Coordinated Universal Time. So I made a mistake right there. So Let's get the function and I am going to look for date. And as I said before, I need a function that can convert a timestamp from Universal Time Coordinated to the target time zone. So let's get that function call. And next, I need to get the actual time, which is UTC now, which returns the current timestamp. And now I want to convert this timestamp from UTC to my South African Standard Time. And I need to paste what I copied within quotes just like that. And let's select Finish. Now, I am going to need a pipeline AT and this I am going to get from the built-in system variables. So let me say that value again and then select, add dynamic content and the pipeline ID. Let me find it under system variables. And it will be the pipeline run AD. Just don't confuse this with the run ID that comes from the batch. This is just to identify each pipeline run, which is just a protein identifier from Azure Data Factory. So let's hit Finish. Now, we will need to set a pipeline name and once more will come from the System pedometer. So this had value and click add dynamic content. From here, we will have the system be iterables and the pipeline names should be here, which is that one. So once more a generator, the pipeline name won't be returned by Azure Data Factory. Let's hit Finish. Now we will need to supply the run ID that comes from our parameter. So let me hit Evaluate again and hit add dynamic content m equal to my pedometers and get into an ID. And for snapshot data, pretty much the same thing. So let's hit value again, and let's get the snapshot date parameter and has hit Finish. Now the source to target ID. Well, in the next coming lessons, and this will have to come from somewhere. But let's not get into that right now. So let's supply a default source to target ID. And I'm just going to give it a two. And let's hit finish. Okay, Now from start time, let me hit value once more because we also need to convert this to the correct time zone. So initially you will need to set the system variables that hold the startTime. So I need to find something that resembles a triggered start time, which should be somewhere around here. Okay, pipeline trigger time. So this is what I am looking for. But I think before I supply this, I will need to set up the function because it does return a UTC universal, well, a UTZ time as well. So let me get a 2D debt functions and let's do a convert from U to C first, then only apply the pipeline trigger time, which is right there. And once more. And they wanted to convert that to the South Africans and a time and hit Finish. And now for status, what I want to do is to just insert a hard-coded value of success and head to the finish button for updates at. Well, this is quite easy though, so I'm just going to copy the values from the end time. Copy, finish, and supply the update at time and hit Finish. Now, there's one thing that I have a chest remembered. I went to ensure that whatever a date that gets written to the pipeline block inflammation has a specific standard format. And to do that, I am going to write at the beginning format this. And I think I can find it under functions. And yes, it's format, data type like that. Now what I want to do is to copy this. Well, actually cut it and paste it here. And I wanted to now set the format that I am looking for. Which is why, why, why? And M and MDD and hit the Finish button. All right, so now that we have a set of the pedometers, which they look fine by the way, we can test this entire pipeline before we do some publishing. So let's test the pipeline. So let's hit the validate button first, which ones are key? And let's hit the Debug button to kick-off of the pipeline. So I am happy with my default parameters. Let me hit, Okay. So let me output to start keeping track. Offer the actual pipeline. Cool. So my pipeline has started. And it looks to be the log pipeline successor activity has also completed successfully. So it's now just sending the e-mail notification. But what we can't do now is to load as your Data Studio just to check if we have loaded the pipeline log table with actual tracking data. All right, So I have loaded my future and what I can't do from here now is just to right-click on the server name and just do a new query. And from here I can just do a select statement. Select all from pipeline log, which is that tuple, and hit the Run button. And as you can see, we have successfully populated our pipe like Rock information. And everything seems just fine. Okay, so you should have received an email right now. And since this looks perfect, because it's, it's for me. And I shall catch you on the next one. Goodbye. 27. Making Email Notifications Dynamic: Hello, good people and welcome back. For this lecture, let's make the email notification pipeline to accept a subject and a message as parameters. We have hard-coded the subject and message all along. And that is not good practice. A calling pipeline must be able to set a unique subject and a message. So let's get cracking and let's not by opening the utility pipeline, which is a send email notifications. And from here, let's add additional parameters. And to start off with, I am going to insert a subject. And also I am going to need a message. Adjust like that. Alright, so what I need to do next is to modify the for each activity and under activities. And he said it's activities. And we I want to go is to the sent email activity because I need to apply the subject and also the message. So let me go to settings and I am interested at modifying the body attribute right here. So what I want to do is to take away this subject right here. And when I want to do is to insert a single quotes. So a single codes. And within civil codes, I want to insert a double commas. And we think commerce, this is where I want to insert my parameter values. So I want to insert subject, right? Yeah, which is 4. And now I want to do the same for message body. So let me take away this message as well from right here. And when I want to do is to now insert the single quotes again and then insert the double commas. And then we think that that will commerce. I want to insert the message parameter just like that. And to finish it off, well, let's just select Finish. And pretty much that's it. Now, what I want to do is to test my pipeline. So let me return back to my normal Canvas right here. And let me do a very date, which seems fine, and I wanted to hit close. Okay. Now, before I continue this time around, I think I want to publish. So let's publish this and select Publish. Just like that. Okay, So after publishing, I wanted to test my pipeline and I'm going to hit that a debug. And I want to supply a subject, so test a subject and a message. Hello, test message. And I want to select, Okay, perfect. Alright, so let me watch the output and it looks like the pipeline has completed successfully. Now, what you need to do is to go and check Gil email. Please ensure that you have received the right message and subject. Alright, what I want to do next is to create another TdT pipeline. And this utility pipeline will bear the responsibility of setting a status message. Then call on the Send Email nitrification pipeline. This status message that I am talking about will either be a success or a failed status associated with the ingestion pipeline, which can then be used as parameters to send through to the Send Email notification pipeline. So let's create a pipeline. And the utility. And what I want to do is to now select those three dotted lines under the utility folder and select new pipeline. And from here, I want it to sit the new pipeline and actually collect PLU and name it. Sit status message, just like that. And now let me enter a description so sets the status message from a calling pipeline and constructs a subject. Let me spell that right. And a message. Perfect. All right, so let me return back to the Canvas. What I wanted to do next is to sit at least four parameters. I will require a coding pipeline to send through a status of success or failure. I will also need to capture the calling pipeline as the triggered pipeline, which we shall see. And I will also need a system code, add a dataset name to filter the email actress. Let's start with this status. So in the status right here. And I also need a system code. And I also need a, actually from here, I want a triggered pipeline and a status. In fact, that should be system code, I beg your pardon. And a dataset name. And it said, All right, we are going to need an activity that will take in the status of success or failed and habitat decide a cause of action. The perfect activity for this is the switch activity. If you've done it paid off programming with languages like Java or C plus, plus C-Sharp, you would come across a switch statement. So the idea is switched to a cause of action depending on a condition. So let's put this to a test. So let me insert the switch activity and I think I should find it on the iterations and conditionals. Perfect. So let me drag that in there. And what I want to do is to give the switch activity a name. So let me call it switch status like that. And I need to enter a description. And a description which we construct a message and a subject. Depending on the supplied status of either success or failed specified. So let's switch to now, a TBT is right here, and we need to supply an expression. But before we supply an expression here, we will also need to provide at least two conditions that decide on a cause of action. These conditions are known as case statements. So we will need a case for a success status and another for a failed status. And finally, we will need a default case. Should we not get a success or a failure status. So let's start by telling the switch activity on which value to evaluate. And we can supply that within the expression box. So let's apply that value and dynamic content. And the value will be status and is hit Finish. Now let's add the first case. And to do that, I am going to select this ad case button right here, and it should search for success. Now, I need to insert an activity that corresponds to the case. To speak. What I want to add here is to send the email notification Under the success status. So let me drag this and put it right there, which is the sending e-mail notifications. And I want to rename it to send email notifications. And let's say it's a success message like that. So now I need to go to settings and I will need to start populating of the parameters as you can probably see. So I am going to require a system code. And let me insert the system code values from the parameters, which is system code, and then the dataset name. And we will need to construct a subject now. So let's construct these subjects shortly. So let's add dynamic content. And what we can do here now is to try and combine the triggered pipeline and also supply a message. It states that the pipeline has completed successfully and we are going to need a concat function for that. So let's go over to functions and then go over to strings and concat. Not what I want to do here now is to construct the first string, which is a D, F pipeline, and it must be in quotes. So ADF pipeline. And I want to supply a colon and a space. And I need to insert here is triggered pipeline pedometer. So let me search for parameters. It should be right here, which is triggered pipeline and Fitch and supply the message. And I am going to need a pedometer. Well, actually a function, concat function again. To insert that. I also want to capture the pipeline game as well. So we will be under the part of me says as well. So the triggered pipeline, and now I want to insert has completed successfully. And let me select Finish Just like that. Okay, Now I want to go back and set the failed status as well. So what I can do from here is a copy and return back to main canvas and head over back to switch status and activities. All right, so what I want to do now is to add the failed status. So let's do add case and maybe insert failed. And let me modify the activities again. Okay, let me do it, do that again. And let me paste right there. And I want to send a field message. And I also wanted to copy those leading characters, so to speak. And the solid settings because we need to send a failed message. So subject, It's got nothing to do here. And message. What I want to do is to insert has failed. Please check. Okay. And I want to insert, finish. Just lock that and then it returned back to my main canvas again. Okay, so why do we want to do is to now add some activity that's based on the default status in case we don't get a success or a failure. So let me return back to the switch status and let me configure for thought. So let me right-click and then paste right over there. And what I want to do is to make some changes. So I want to change this to say default message and takeaway that. And let me go to Settings once more. And the such it, which is to find. And alright, so I wanted to send a correct message here to reflect that I have not received a failure or a success. So the first thing is to modify this has no valid status. And I wanted to include the status right here. Let me insert a column. And the status that has been supplied will come from this particular parameter. In fact, it does not need to be in quotes. So let me take away that. And I needed to tell the user that I am expecting a field or a success status. So expecting Field 0, success. Okay, So this needs to be inside the codes before I make another blunder right over. And that seems fine. So let me hit the Finish button. So let me return back to the main canvas of PLT, Eutelsat status. And it seems like that we need to start testing this. So let's really do to see if we don't have any problems and it does not appear so, so let me hit close just like that. Okay, so now I want to start testing this pipeline. But let me hit publish first because I seem to forget this. So let me do a publish. And after deployment, we can run the test. Okay, So now we are going to test for a success. So let's hit Debug and that's insert a status of success. And triggered pipeline. Well, PL data ingestion underscore JSON and the system code is o ws and dataset name is sales. So let's hit Okay, and let's look at the output. And what do you need to do here is to now check your e-mail address for the prolific messages. Okay, so now I am not going to repeat the test for failed and also default, please, to run the test and make sure that you get all the three messages. Now to trigger a default, you just need to enter some junk gibberish and you should get a default message. Okay? This is it for me and I shall catch you on the next one. Could write. 28. Making Loggin of Pipelines Metadata Driven: Hello good people and welcome back. What we want to do for this lecture is to make the logging of pipeline metadata its own stand-alone pipeline. And furthermore, instead of just a login only pipeline successes, we wanted to also log failures as well into the database table PyCon block. We need to do this because an activity within a particular pipeline can actually fail as you have already seen. And therefore, we need to log a failure into the pipeline log table as well. So I would like to create a new pipeline. So please do the same within the utility folder. So, so let's the three dots and select new pipeline. And in here I want to add a new name. So I would like to call it PL util underscore log pipeline and the score mater data, just like that. Now I want to insert a description. So log pipeline, run information and metadata in brackets to select that. All right, so when we reflect back to version four of our ingestion pipeline, the pipeline success activity required quite a number of parameters that should come from the ingestion pipeline. And the values of the parameters were then supplied to the Insert pipeline log stored procedure. So this new utility pipeline will need to supply the required parameters as well that will be supplied to the Insert pipeline log stored procedure. So let's add these parameters and let's start with snapshot date and followed by status. And then we have Run AD. And the next parameter, I have a source to target ID. And from here we have a system code. And actually I want to start off with a pipeline trigger time and a pipeline ID and a pipeline name. And then we can have a system code and also a, a dataset name. Just like that. Perfect. Now let's reflect on the parameters, pipeline trigger time, pipeline ID, and also Pipeline name. And these values will be coming from the ingestion pipeline. So therefore, we can't use values coming from the utility pipeline system variables. It's simply because that will log incorrect information that is pertaining to the utility pipeline instead of the ingestion pipeline. Alright, so now what we need to do next is to copy the activity that's coming from the ingestion pipeline, which is default version. And the activity that I am referring to is log pipeline success. So let's copy that. So right-click copy and it's returned back to the utility pipeline. And it has basic, yeah, so right-click and paste. What I want to do here now is to change the description or perhaps the names or log pipeline, run information. And in brackets meta-data. Okay, cool, so it doesn't allow brackets. So let's take that only. And that's enough. Okay, so now we need to supply some values under settings. So let's head over to settings. And we've got some parameters that we need to supply. So let's start with the pipeline ID. And let's take away that because that should come from the pipeline ID parameter. And that's the same for pipeline knaves or they didn't move that. That should come from Pipeline name. And also the run ID, same thing that should come from the pedometer as well. And also the snapshot date that should come from the pedometer. And also a source to target ID should come from the pipeline parameter. So we remove that little mistake and select Finish. At least startTime should also come from the meter. So pipeline trigger time. And also the status should come from the parameter. And we'll leave eat fat as it is. Notice that we have not bothered to modify the end time parameter. It's simply because there is no available system variable that we can use to allocate an end-time from a colleague activity or a pipeline. So we have to use the current timestamp to reflect an n-type. So basically, the point where this activity is called Opera House invoked, that becomes the end-time. All right, So after setting the parameters, we want to have the ability to set a status message. So let's add in the utility that will allow us to set a status message. So which is right here. And I want to chop it right over here. And what I want to do now is to connect the stored procedure activity to the execute pipeline activity. Now, I want to modify this. And I wanted to insert a name called sit status and send message. Just like that. And let's head over to settings. And we still need to supply the pedometers. So let's apply these datas parameter which will come from well status and also the treatment pipeline, which is Pipeline name by the way. And also the system code, which is that parameter. And also the dataset name, which is that pedometer. And it finish. Okay, So now we have completed that development or this particular pipeline. So what I want to do next is to do a visited all and hit Close button since there's no validation errors. And I want to publish. So let me publish that and select Publish. Okay, cool. So let's test this particular pipeline. So I would like you to load up your Data Studio and make sure you query the pipeline log information. And what I want to get from here is the format of startTime. So solid startTime and right-click and just copy. Now what I want to do is to run this pipeline. So let me select debug and I need to apply some values. So the first value that I'm going to apply will be the central eight, which is 202107. And let's say the 12th. And let's start off with a success status, which is just fine. And let's put in an arbitrary right idea. Let's say 100 sought to target ID of two. And the pipeline trigger time. So right-click and just paste here. And then you change the date to that. And then you look at my current time which is 11, 26. So 26, which is fine. And chat. If I'm going for today, then it means that I'm running this as a snap shot. Let's insert a pipeline ID and this will be just any number for now. So let's see a2. The pipeline name is insert PL, data ingestion, let's say JSON and the system code of o, ws and also sales. Okay, so this looks fine, I believe. And let me select, Okay to xi cost or the execution. And let's go over to output to monitor the execution. It looks like the pipeline is now running. Okay, So it has succeeded. All right, So to confirm this, what we can do is to check the pipeline log and let's see if we got another entry. And yes, we have, and this is successful. So another thing that you can check as well is to see if you have received a success message. Okay, So this is it for me, and I shall catch you on Linux one. Goodbye. 29. Add a new way to log the main ingestion pipeline: Hello, good people and welcome back. For this lesson, Let's see how we can modify data ingestion pipeline to make use of the new way to look at the pipeline information. We will also need to deal with setting the failed and a success status. Okay, so let's start by cloning the ingestion pipeline, which is the fourth version. So let me select the three dots and select Clone. And from here, I just want to change it a need to version five, just like that. And now what I want to do next is to remove these two activities because I no longer need them. So let me select the stored procedure and hit the Delete button. And now the Send Email notification activity as well, it should go. So let's hit the button. Now, since we have removed the log pipeline success or procedure activity, we still need to have a way of knocking the pipeline run metadata. But this time around, it needs to cater for a success and a failure status. We are going to do this by creating a variable that will store the status of the pipeline and then use the value of the variable to determine a cause of action a. So let's create a variable. And to do that, we can come here two variables. Select New. And I am going to supply a name off of v underscore status. And let's insert a default value, which is a success. So once we have a vertebral established, we need to set the value accordingly. Now, almost every activity, including the copy activity, at least four outputs. And so far we have used just one. It has always been the success output. Now let's observe these four outputs that I am talking about. I would like you to select this button that looks like a plus sign and a arrow that's pointing to the right. Select that to reveal the four outputs we are interested at what happens when a failure, at a success event occurs. But obviously, it's also good to be well-aware of either output events such as completion and also skipped. You can use a skipped if your activity has been skipped when you have multiple flows. And maybe perhaps you wanted to log that event or send a message when an activity completes. Maybe you want to add other flow of activities, or perhaps maybe to insert more pipeline tracking information. Okay, so let's start with the success event first. And as I said before, I wanted to set the value of the variable. And to do that, I will need an activity which is under genital and it will be set variable. Okay, So let me join this success. I'll put event to set variable. Now let me set the name first. So set success, status. Okay, Let's set the variable, which is the status. And let's insert a value of success. Perfect. All right, so let's do the same for a fails the test. So let me copy that. Right-click and then paste. And let's set of the output right here and select failure. Cool and join that to the activity of a cool. Now, I wanted to set a status of field. So set fetal status and navigate back to variables. And I wanted to set that was a failed status, just like that. Okay, Cool. Now what I want to do next is a drag, this Meta data pipeline, which will enable me to do that actual logging that I am looking for. And let me join that to the execute pipeline activity. All right, now we need to set up the parameters. But first before we do that, I need to insert a name which is more appropriate. So log pipeline, metadata, dash, success. Okay, now what I want to do is to start setting the parameters. So let's do settings just like that. And like we have done before, we need to start setting the parameters like I have just mentioned. So first and foremost. You need to set the snapshot date. So let me get this snapshot date, which is right here. And like we have done previously, I want to guarantee the format of the snapshot date. And to do that, let me call two functions again. And let me go to date functions. And it should be format Date, Time. Okay, now I need to insert this to be inside and paste it there. Insert a comma, and let's set the imitate. So y, y, y, dash m HDD. And it seems just about fine. And I'm going to select Finish. Now what I want to do here is to set the status. So when he said The status, which comes from the variable, cool. And now the run ID that should come from the parameters. And he sought to target ID, which we still going to hardcode is two for now. And the pipeline trigger time. Actually, the pipeline trigger time can now come from the system variables. So that's fine. Pipeline trigger time right there. And also the pipeline ID, which also comes, and the system variables, which is pipeline run ID. And the pipeline name under system variables, let's find Pipeline name and the system code. Let's insert system code from the parameters as well. And also the dataset name as insert, the dataset name that comes from the parameters as well. And let's insert, finish. And that's it pretty much. Okay, so let's copy this execute pipeline again. So copy, paste that because we need to do the same for the field status. So let's copy that. And what we can do here now is to change that to field. And let me just indicate back to set variable. Okay, cool. This has been set correctly. Okay. It looks like we have completed what we needed to do. We can double-check to see if we need to check anything because the only thing that needs to change here is just the status, which is just fine. Alright, cool. What we can do now is to do a fairly big O. So there's no problems there. And let's do a publish. And it hit the Publish button. And now we can start testing after the deployment of the changes have completed. So let me hit the Debug button. Still happy with It's not shocked date. Let me give that a run ID of a 100 and let's hit the Okay button to start running the pipeline. So let me navigate to output. We could monitor the pipeline run. Hear me do a refresh, and it looks like our pipeline is in progress while at least two activities has completed now. And we just waiting for the log pipeline metered data success, which is 100% fine. Tony, refresh that because I believe it should be done. Yes, it has completed successfully. Now what I want to do is to go back to Azure Data Studio and this query, the pipeline log information once again. And if I observe the results here, I am looking for ran ID of 200. And it looks like it has supplied the details as expected. If I look at the startTime is fine and also the end time, okay? Obviously something doesn't look right here. If you look at the startTime and also the end-time visit disparity of two hours. So which means somewhere somehow we need to set the correct timestamp. So let me revisit back my pipeline. And then you start with the successor activity earnest, check something here. So pipeline runtime, it should be able to convert that into my time zone. So Nickleback two functions again. And when I want to do here is to call, to eat functions. And I want to see convert from UTC. And I want to insert the pipeline trigger piece of there. And I wanted to insert my time zone, sort of South Africa Standard Time. So please don't forget the time zone. It's very important. Alec, select, Finish. Okay, So this should solve my problem. Let me just go back here, copy this because I need to do the same for the fail status. And under trigger time. Let's remove that, piece it there and hit the Finish button. So they need to do a publish again. So let me hit Publish. All right, cool. What I want to do now is to test for a field status. So let's manipulate a copy activity and another source. I want to modify this particular dataset here. So let me just hit open here. And I wanted to take away that S cool. And when I want to do, is to do a very good All once again, so nothing wrong over there. And he had bet to my ingestion pipeline. From here. Let's do the debug and let's see what happens. Cool. Then he said a run ID of five and insert. Okay? All right, so let's observe what actually happens is since we are setting this pipeline up for failure. Okay, so let me refresh this and see what happens. And indeed, our pipeline has failed. So let's see what has happened here. So we do know that now it has settled the field status as we can observe here. And it looks like our log pipeline meta-data activity is in progress, basically just to log the failure. So which is fine. So let's wait for this to finish and then tested again. Cool, it has succeeded. So let me return back to my pipeline information. And we have received a fair status right here. It looks like we have a fixed the startTime and also the n times or which looks about just correct. Okay, so another thing that you'll need to do is to start checking your emails to see if you have received both a failed message and a success message as well as your last test. But I saw, let me discard these changes right here. And this is it. All right. So I will see you on the next one. Goodbye. 30. Change Log Pipeline to Send Failure Messages Only: Hello good people and welcome back. What we want to do next is to modify the market pipeline meta-data you tutti, to send messages only when an ingestion pipeline actually fails. If we think about it, a batch can have multiple pipelines. And depending on the source system, this could be ten or even 100 and source files or tables that translated to each pipeline run. Now, just imagine sending ten to 100 messages of success. This can amount to flooding off your inbox. And it can be a costly exercise, less rather limited to just a few years. The only message of success that we are going to send is when the entire patch has completed successfully. All right, so what I want to do now is select the log pipeline metadata. And what I want to do from here, decide on whether to send a message or not. We need to evaluate the status. So if the status is success, we don't send a message. Otherwise, if the status is failed, we send a message. So let's add the if conditional activity for this exercise. And we are going to find that activity under iteration and conditionals. And here we are. And let's put that right into the Canvas. Okay, So why do you want to do next is to set the name, so test status. And also set a description of a test, the status, success or failed. And that should be four. So tested these datas for success or failure. So the next step is to do the actual testing of the status value. And we can do that within the expression field. To see the expression field, we can go to activities which is right here. So the expression must always result in a boolean value of true or false. We are going to test if the status is equal to a success. And if it's equal to a success, it means that the expression will evaluate to true. And what we need to do is to just log or the pipeline information. But however, if the expression evaluates to false, we will look at the pipeline information, data and also send the message. So let's get cracking. So first and foremost, we need to supply an expression. So let me click at dynamic content and navigate to system variables, which is right here. Actually it should be under functions and other logical functions. I want to find. Now, I need to test if a parameter status is equal to success in quotes just like that. And this should either return a true or false, depending on whether that evaluation is a success or a failed. So let's hit the Finish button. Okay, so now we need to test the expression and we can do that by supplying a case of true or false. And for a suitcase, I want to log, adjust the information about the pipeline. So right-click and just copy and returned back to the if condition. Okay, and let's evaluate the true status. Right-click and hit paste. Now let's see if we need to change anything here. Let's remove that copy and let's insert and insert. Success. Just like that. Let me return back to my pipeline and returned back to my if condition. Now, what I want to do next is to start supplying what should happen if I get a case or false? And what I want to do basically is to run the stored procedure and also send a status message. So let me copy this dot procedure and execute pipeline. So press your left controlled button, hold it and left-click on your stored procedure and also your execute pipeline. So right-click Copy. Let me return back to if condition. Let me select the false keys. Right-click and paste it here. Okay, Now I need to rearrange this. Let me select stored procedure again because I need to make some changes here. And you add a dash of failed and then send a message right there. I need to take away that, which is fine. And I need to return back to pipeline Meta data. Okay, so now all I need to do is remove the stored procedure and that execute pipeline. And pretty much this is how our pipeline Meta data will look like. All right, so finally, let's meditate to this and select, Okay, and let's test this and see what happens. So what I want to do first before I start testing is to do publish, publish the pipeline, and is hit the Debug button. So let's apply the snapshot date. So 20210712. Let's start with. These datas are failed. And then the supply air on AD of the a 100 source to target ID of two pipeline sugar time. So once more I'm going to return back to us your Data Studio. And from Azure Data Studio, what I want to do like I did previously, I just want to grab this datetime, copy that, and I just want to sit here. Let me look at the current time, which on my site is 1437 at the time of recording. And the pipeline ID, Let's give it a value of one pipeline name. Pl, a data ingestion, JSON system code of o ws, and the dataset of sales. And I'm going to hit Okay. All right, so let's see how the pipeline is executing. So returned back to the main canvas and just select output. And this refresh to see if the pipeline has started. Cool. So now the pipeline is in progress. So let me refresh there again. And it has succeeded. Cool. So now what you need to do to check your email address to see if the failed status has come through. Okay. So this is it for me and I shall catch you from the next one. 31. Create Dynamic Datasets: Hello good people and welcome back. It's time to make the dataset metadata driven, or perhaps in other words, make them dynamic. At the moment we have our source and target information hard-coded. And we need to change this salt that the information comes from the database. So let's start by creating a new dataset, which will be our source dataset and replace the hard-coded JSON source for new dataset will have to cater for multiple JSON files. Now, what do we need to do is select for this dataset that we have created a couple of moments ago. And what we want to do here is to clone a DEA dataset. Now, we need to provide a more appropriate name. And so it just takes that to underscore files. Underscore Jason, just like that. And let's insert a description this time around. So this insert Jason source dataset. All right, Since we are making the dataset to make a data-driven or perhaps dynamic. This means that we would have to supply that is related to the file path with parameters. This means that the values will only ever been known or perhaps applied during runtime of a pipeline. So we are going to need parameters that will set the values for the container, the directory, and also the filename. So let's set up the parameters. So I will navigate to pedometers right here. And I am going to need Tina first. And the next parameter, it will be a directory. And the next one will be a fun name, which is effectively the last one. Alright, now that we have for the parameters in place, what we can do now is to return to the connection information and supply or the parameter values to the file path. So let's return to connection right here. And it said X or the file path. Select add dynamic content. And let me apply the perimeter of a container. And let me hit Finish. Now, what I want to do here is to also apply a D pedometer 40 directory. So let's add dynamic content. Once again, AD Directory and click Finish. And lastly, that's added the pedometer for the filename. One small select the file name and click add dynamic content, and it's inserted the parameter file name. And let's hit that Finished button. Since the information will be realized during runtime of the pipelines, what we can do is to remove the hard-coded schema. And to remove the hard-coded schema, we can navigate to schema, which is right here. As you can see, the schema still reflects the old JSON file attributes and your datatypes. So let's clear the schema because for every new file that it runs for, the schema will change dynamically. And this schema could potentially cause runtime problems. So that's why the kids. So let's hit the clear button. Okay, what we can't do now is to preview the data that's returned back to connection. Then we hit the Publish button first and has published our dataset. And now what we can do is to do a preview. So let me do a preview right here. You select Preview data. And now I will need to supply the values of the container. The container. And the next value that we need to supply is a directory. And I am going to insert a dot, which means current folder. And in this instance basically what this translates to is a dot, but reflecting the current folder, which is sales, because we've got no further directories within our container. And let's apply a default name. So online sales, but JSON and hit, Okay, and let's see if we get anything back. And there you have it equal to our data back as x. So we have sorted out source information. Let's do the same for the Sink Dataset. Now, let me just close as preview data and let me open this left painful factory resources again. And let me head back to meta-data. Actually, within datasets. This is a dataset that I'm looking for, which I also wanted to call in as well. So let me select the three dots and clone a DEA Data Engineering, a JSON dataset. Now we are going to insert the appropriate name. I am going set of files. I am going to supply folders instead and underscore adjacent. When it comes to datasets that are stored in a data lake, we tend to refer to folders instead of actual filenames. So when we read the data from a data lake, it is normally recommended to just read the entire folder. We shall see as time goes by, when we start adding partitions, where you will be expected to read from a folder That's related to a specific partition. And the petition in this case will be a snapshot date, which we shall apply in the next coming lessons. Okay, so let's set parameters once more. And I am looking for a container as my first part of neater. And then followed by a directory again. And in the case that you insist on reading an actual filename, well, let's apply a filename. All right, so you only supply a filename if it makes sense to do so, but always seek to read the entire directory. So let's head back to connection once again. And from here now I need to start supplying the part of meters. So let me hit dynamic content. I want to supply the container and you select Finish. Now 40 directory, that's also add dynamic content and it's inserted directory, right? Say for the file, let's do the exact same thing. There's inserted the filename right over there. Okay. Like I have done before, I wanted to preview my data. So this preview the data and then hit Preview and insert the values of the parameters. And the first one, which is container, will be web development. And the directory, it will be webstore slash slash online sales. Since because a filename is required here, that's just inserted just for a test. So online sales. But JSON and hit, Okay, and here we go. We have loaded the data as expected. All right, so let's also demonstrate if we are interested at reading the entire folder. So let me take away this preview data. And all I need to do here is to just remove that parameter. Hit Finish. And now to preview data, we only have to supply the values for container and directory, which is enough and as hit, Okay, and there you have it. We've got our data from our folder. All right, so let me close the preview data and for now I am going to leave the file name and it hit the Finish button. Okay, so let's publish this. And that's it. Alright, so we are going to need a new dataset. And we are going to use this dataset to outline the source to target information. And this information will come from a view that has been created within the database. So the view effectively tells us which source file must be delivered to which sink. All right, So under the Metadata folder, which is this one, we can select the three dots again and hit a new dataset. From here. I want to select as your SQL, which is somewhere around here or there you have it and hit Continue. Now, as for the name, I want to supply the name of DES, a SQL underscore source to target details. And from here, I wanted to select the linked service, which is the Data Engineering meta-data Linked Service, and hit the Okay button. Okay, so let's apply the table because we haven't exactly supply one. So select table, and let's choose a source to target view. And let's preview the data and let's see what happens. And we do have our source to target information displayed right here. Okay, so let me close my preview of the data. And what I want to do now is just hit that Publish button to publish the dataset. And as for our last dataset that we are going to need are the actual batch details. Now once more. In fact, instead of just to create a new one entirely, we can just simply just clone this one. And let's just change the name to batch underscore detail. And it selects the appropriate table, which will be batch one. And let's preview that data to see if we get anything. Okay, we haven't initialized any patch yet, which is fine, but when we do, we are going to get some data back. Okay, so what we can do is to hit the Publish button. Alright, so this is enough for one lesson, and this is it for me, and I shall catch you on the next one. Goodbye. 32. Reading from Source To Target Part 1: Hello good people and welcome back. Let's modify the pipeline to start using the metadata driven sources and sinks that we have created from our last lesson. So the first thing that we need to do is to observe the source to target details. So let's navigate to our dataset, which is right here. And then select Preview. Okay, So let's scroll and observe the data to the right. Okay, let's stop right here. Now, if we take a closer look at the source to target information, we can tell that we need at least four files to ingest, which is online sales, to chasten, the products, the currency codes, and also the promotion type. Now, obviously, we don't have the last three files within our Azure Blob Storage. What you can't do, however, is to download the attached zip file called Sales Rep dot. All right, so once you have downloaded the cells ref dot zip file, what you can't do here is to right-click and just hit the Extract All button to extract this file. And the chosen folder will be sales rep. And I am just going to hit extract. Now within the sales rep folder, you'll find another folder called Sales Rep. So navigate into cellular sales ref. And here you will find that the rest of the three files that we haven't uploaded yet. All right, so what you need to do next is to load your Azure Blob Storage, which is related to our WIP store. So let's load at the dashboard by selecting menu and hit dashboard. So let's load the web store. And once we have an audit of the web store storage account, what we can't do now is to navigate to containers. And within containers, I would like to create a new container. So select new container right over there. And I want to call it sales rep and hit the Create button. Now, what we can't do next is to upload the files. So let's hit and let's hit Upload. And then it's selected the files and its currency, products and also the promotion type. And hit Open and click Upload just like that. Okay, it looks like we have uploaded the file successfully. What we can do now is to return to our Data Factory. So let me just close with this preview. What we can do is to now clone the pipeline. So let me select Version 5 of my ingestion pipeline. And what I can't do from here is now clone the pipeline. Actually I need to do it from here, from the three dots and hit the clone button right there. Now, what I want to do from here is to create a version six of my pipeline. And that's pretty much it. All right. What we need to do next is to supply the Meta data driven datasets to copy data activity, then supply or the correct parameters. And to get the data into the copy data activity, we are going to need a lookup. So let's find that the lookup, and it should be somewhere around general activities. And it is right here. Just like that. So crept into the Canvas. Alright, so I want to rename my lookup and just call it good source to target the details. And I wanted to set the timeout to ten minutes. And now I need to head to settings so I can put my dataset that I need to query from. So let me select the source to target details. And what I want to do from here is now query from the stored procedure instead. All right, so let's supply or the stored procedure name, which will be good source to target info. And let's click the Import button to load, uh, the parameters that we need to supply. All right, So let's start supplying the first part, meter, which is source a system code. So let's insert the values right here. Add dynamic content. And let's get the system code from the parameters and click the Finish button. Alright, so the stage parameter will have to come from a variable that we are going to create together. So let's create the variable. But if we can't do is to select the canvas itself. And within variables, we can select New. And let's insert a VE stage. And it said that with a default value of source, underscore, underscore, grow. Just like that. All right, let's get back to the liquid. And within the stage parameter, let's apply the value of V stage right here and click Finish. Now, I want to make sure that I do get in return multiple roles. So I am going to unclick first row just like that. And like we normally do, is tested if we get some data back. So let me hit the Preview button. And everything seems fine right here and click, Okay, Perfect. The values that we have returned are pretty much the same as the values. Where do we query the source to target dataset directly? However, the whole point of supplying the parameters of a system code and the stage. We do this to filter records based on the provided values. We are, after all building a generic meta data driven ingestion platform. And it can contain multiple source systems and different kind of unique stages. So if you ever use this idea at your workplace, I do have an expectation that this source to target you will contain far more records than we see right here. So it's a good idea to filter on a specific source system. Alright, so let's close that is preview right here. And let us attach this lookup to the copy data activity, at least for now. Alright, so what do we want to do is to validate to this and hit the Okay button. Let's publish it. And hit the Publish button. Alright, so we have done enough for one video. What we can't do is to continue our development in the next lesson. So this is it for me, and there shall catch you on the next one. Goodbye. 33. Reading from Source To Target Part 2: Hello, good people and welcome back. In the last lecture, we have uploaded our new datasets and also managed to setup a look up activity. What you see here is fundamentally wrong. It doesn't make sense for the lookup activity to flow into the hockey data activity. I did this in order to ensure a successful validation and prevent making another lengthy video. What we need to do now for this lecture is to apply a for each activity and ensure that the ingestion process is executed in an iterative manner. Alright, let's add a for each activity so we can traverse through each source to target information. What we can do now is to navigate to iteration and conditionals. From here we can find that the for each activity, so crap that and drop it inside the canvas. Okay, so the next thing to do is to take away this relationship that had just hit delete. Okay, now I need to attach my lookup flow activity into for each sodas. Do that and attach it to D for each activity. Okay, So what we can do next is to set out before each activity and let's change the name to something that makes sense. So for each source to target info, and I think that's just about perfect. All right, so let's move over to settings because from here, we need to supply the items are attribute. So let me select items and add dynamic content. Let us do it again. And it select the Git source to target details activity output. And we want to grab all the values, so dot value and hit the Finish button. Alright, so what we can do is to copy the activities that you see right here and make sure to have them within a theme for each activities. Now, to do that, let me select each and every activity on my right here. So let me select all of them. Right-click and then Copy. Now, I wanted to paste them inside activities within the for-each. So select for each, and then select activities. Right-click and paste, just like that. Okay, so now obviously I need to rearrange this and reattach the activities unfortunately, because you can't copy of the flow, which is rather, as I said, unfortunate, but anyways, let's continue. So let me get the copy data activity right there. And I wanted to now set the successful flow too. Success. And I want to do the same for set failed, but this time around it should be the failure activity to set the variable right over there. And I need to change or perhaps modify that to make sure that the name make sense. And it's do the same for the successor activity. And that should be fine. And now I need to attach the log pipeline metadata activity as well. So for success, I needed to attach that right there. And also the failed I needed to attach these two activities just like that. Okay. Once more, I needed to change the names for them to make a whole lot of sense. And is do that for the lock pipeline later, data success activity as well. All right, so what do we need to do next is to supply the Meta data driven datasets to the copy activity and then supply of the correct parameters. Now to do that, let me select a harpy data activity. And the first order of business is to do something about the source. So let's select the correct source dataset, which should be right here. And it will be by a lot files adjacent, just like that. And we should now supply the required parameters. All right, So let's set the container first, added dynamic content. And this should come the force source to target in for item. And this one will be source container, just like that. And let me hit Finish button. And this one should be directory. And, and S4 directory. So let me select directory, pretty much get the item dot, source relative schema. And let's hit that Finished button. Now for filename, Let's get to the item again. Dot source, technical name, and hit Finish button. Okay, so we now need to do the same thing for our sink pretty much. So let's head over to sync that selects the correct dataset, which is a data engineering folders underscore JSON. Now, once more, let's get to the container information. So item and it will be target container and it's hit Finish and directory. Now as for directory, we want to also set the part that includes a snapshot date. It will allow you to navigate to the corresponding snapshot which holds the data that was ingested on that particular day. So we are going to need a concat function. So we can combine a directory path and also the snapshot. So let's find the CONCAT right here. And within concat, I want to also start with the directory details. So let's enter the current item that comes from the for each activity. So dot target relative part schema. And what I want to do here now is to insert a comma. I wanted to supply a forward slash since we are creating another directory. And or perhaps a folder. Now I want to add the snapshot date, and that will come from the parameters. But before I do that, I will need to supply a format as I wanted to guarantee that the date format. So undertake functions. I need to find the format datetime, once again. At that there. Now I need to supply the values of at least snapshot date will come from the system variables. Actually, it'll come from the parameters, which is a snapshot date. Now I wanted to insert the format. So y, y, y, dash, dash d, d, just like that. And if you think looks just fine. And let's hit that finish button, just like that. Now let's apply the file name, which will be much more easier. And that will also come from the for each iterator, that target technical name like that. And let's hit Finish button, just like that. All right, so we have hard-coded the source to target ID within the execute pipeline activity related to the logging of pipeline metadata. So let's solve that particular problem. So let me select the execute pipeline right here, and let's move over to Settings. Now within settings, we've got our source to target ID. And let's add dynamic content. And now it should come from the each iterator. And it will be dot source to target ID just like that and hit Finish button. Now in fact, I am going to do the same for the field activity as well. So let me just copy that. Hit Finish again. And let's find the next activity, which has the failed activity. Go to Settings. And let's apply the same for the source to target ID. Let's do that and paste it there, and hit the Finish button. Cool. Let's return back to the main gesture because now we need to delete what we don't need. So let's delete for these items. All right, so what we can't do now is to start testing our pipeline. So let's hit the Debug button. And then you supply a Ryan ID of 900 and a snapshot date of the 14th of July and hit OK button. Now to start tracking or the output, let's select the output tab. All right, so our pipeline has started as expected. So let's wait for it to a couple of seconds or perhaps even minutes to finish the execution. Okay, it looks like I want to run has almost completed, so there's only just one more task, but it should finish successfully. Okay, so we have completed this task. What we can't do now is to go back to the US, your homepage on your portal. And what we can't do from here is to select a menu, go to dashboard, and we want to validate the data that we have just ingested. So please a solid a DEA Data Lake Storage account. And from here we can select containers. And within the containers we find our directory web development. So let's head inside there as head inside Web Store and then R4. And as you can see, we have successfully ingested our data. So what you see here is currency codes, online sales, product, eventually promotion type as expected. Okay, so let's navigate inside that choose one, Let's say currency code. And if you see inside here, this is the snapshot data that we have created in the form of a folder like that, which effectively is a petition. So let's get inside of the folder. And as you can see, we have successfully delivered currency codes. Okay, So this is enough on lesson is essential me and I shall catch you on Linux one. Goodbye. 34. Explaining the Source To Target Stored Proc: Hello, good people and welcome back. In the last lesson, we have observed how we needed to use the source to target info stored procedure to get the information that the ingestion pipeline would use to ingest files in 2D data leak. So let's take at this moment and observe and his very simple stored procedure. I would like you to open as your Data Studio and make sure to sign-in and loaded the stored procedure, which is under programmability and under stored procedures. And right here we will find the good source to target info. So right-click and script as creative as you can probably see, this is yet another simple stored procedure that requires two parameters to serve as input. The parameters are a system code and a stage. The system code is quite self-explanatory. But in order to address the logic behind the stage parameter less than Query source to target MetaData table, which the source to target Metadata view is actually based on. And to do that, let's navigate to two tables. And from here, let's find source to target Metadata. Right-click and select the top 100. As you can see, we have of the source and the target ID, which both a link, a 2D data object, metadata table, which stores the source and target file objects. Our point of interest here is the stage column. The value of this stage column is made to reflect that this source to target ingestion endeavor is pulling data from a source system into the raw layer off a data leak. The ingestion platform can actually ingest the data into a number of hidden layers apart from the Ranier, a typical data lake may have at least two layers. The first being the raw layer, which stores the data as war as possible. Which means data that has not been transformed or changed in any other way. And the original format, which is Jason, in this instance, is actually kept as it is. Another layer is known as the trusted layer. This layer, such as data, the data has been keyed off defects and also conformed to a common standard. And it's also kept in a uniform file format such as Parquet. Perhaps a third layer may be required to store data that is curated for a specific purpose, perhaps for business intelligence reporting or other business requirements such as machine learning. So you can potentially use at the exact same ingestion engine, but perhaps with some differences to deliver data into all these layers. And we can use a VSD h parameter to reflect where the data is going. So let's query the data object, Meta data table. And to do that, There's find it right here, right-click and select top 100. The table contains information about all the objects that form part of the ingestion system. Here we can see in the data objects such as the online sales and products and which system that they belong to as clearly describe within the data object in four column. And the system info. We can get indication of their file types, that container in your and even their directory part. A few moments ago, we talked about the role Leah and that translated to a directory in eagerly. Take a look at the relative part. As you can see, that the direction is structured into a business gummy, then followed by the wrong directory. Layers such as trusted and curated will also appear as just directories within the data lake. Alright, so let's head back to the stored procedure right here. And as mentioned before, all the stored procedure requires is a source system code ACTH and it just returns the source to target information. Alright, does it for me, and I shall catch you on the next one. Goodbye. 35. Add Orchestration Pipeline Part 1: Hello, Could people and welcome back. It's time to add the orchestration pipeline, which will drive of the patch run. The orchestration pipeline will receive information from the metadata database concerning batch details. So let's create the pipeline. And to do that, I want to create a separate folder once more within the pipelines. So let's select those three dots. Select New Folder, and insert orchestration. And click Create. Now what I want to do next is to create a pipeline within the orchestration folder. So let's select those three dots again and hit New pipeline. Now we need to insert a pipeline name. So it will be P0 underscore porch for orchestration. And let's call it a daily. And the score batch underscore run just like that. And now I want to supply a description. So daily batch run orchestration of ingestion pipeline to us like that. Alright, since this will be a batch run that will be triggered daily, we need to set a variable to indicate a daily run and the associated system code. All right, so let's set the variables. So select the variables tab and hit New. And within here, let's do our first variable, which is a frequency. So v underscore frequency. And I wanted to store a default value of daily. Next up is the system codes. So v underscore system code. And a value of o ws are just like that. All right, Before we continue any further, let's insert the batch detailed into the batch 1 tables. So therefore, I would like you to open your Azure Data Studio. And let's open the script that's going to give us the patch run information. So let's select File, Open File. And the script that will initialize our batch run is initialized run and select Open. Now all I need to do is to just run the script to insert the batch details. Now to confirm, if we have the batch details, I would like you to go over two tables, right-click and select the top 100. And there you have it. We've got our batch details. Okay, So I also wanted to clear the pipeline log. Now, I want you to right-click on the server name, new query. And I wanted to delete every value from the pipeline block. So delete from the DIYbio dot pipeline mark and hit that run button now to double-check, select all from PBO pipeline log. And let's see. And let's hit that run button. As you can see, we are starting on a clean slate. All right, so let's go back to Azure Data Factory. Okay, so what we're going to need first here is the lookup activity, which is under general and unattainable. Just find lookup activity and then drag it into the main canvas. From here, all I want to do is to do a get batch, run the details for the name. And let's set the timeout to to just 10 minutes, like we normally do. Just like that. Okay, so under settings, I wanted to select the source dataset, which is batch details right here. And I want to return more than one role. So let me unselect first row only. And as usual, instead of query the table directory, what I prefer to query is the stored procedure. So let's get to the stored procedure name, which is get back. Just like that. Let's click Import parameter. And now we will need to supply the frequency parameter and also the system code. So let's select frequency first, add dynamic content, and it's inserted the frequency variable. Lastly, let's do the system code and select that and click Finish, Just like that. Alright, so what we can't do now is to run a preview. So let's do that. So let me scroll up and hit the preview of the parameters are just fine for frequency and also system code. And let's hit the OK button and see what happens. And there you have it. We have got our batch details. Okay, So what you need to notice here is that the yuan status has now changed to in progress. So what happens is that the second we draw the batch details, it tries and find a logo for batch details that have not started here, perhaps with the status of Nord started, as you can see here, if we go back to your SQL. So let me find the tab to query the pressure one, which is this one. So, so this record here actually represents a batch that has not started. And as soon as we decided to get the batch details, the status of node started actually turned into in-progress. So just keep that in mind. Okay, So let's save this. Validate all first pixels, and let's hit the Publish button to publish. All right, we are not done yet. We are going to come back in the next lesson to finish this orchestration pipeline. And this is it from me and the shall catch you on the next one. Goodbye. 36. Add Orchestration Pipeline Part 2: Hello, good people and welcome back. In the last lesson, we have managed to initialize at the batch run and also created a new orchestration pipeline together with a look up activity. So let's continue whether the development of our orchestration pipeline. Alright, what do we need to do next is to insert a four loop activity so we can traverse through each and every batch detailed entry that is returned by the lookup activity, which is get batch details. So let's insert that for each activity and we can find it under iteration and conditionals. So let's grab the for each activity into the canvas. Now, what we can't do is to attach the lookup activity to the for each. Alright, what I want to do next is to give my for each activity a proper name. So for each batch run information. And I think this name should be just fine. So let's go over to settings where we need to supply our items. All right, now our items will come from the lookup activity. So let's apply the items and select add dynamic content. Now from here, the activity outputs, as I have just mentioned, will come from the lookup activity. And I wanted to get all the value. So let's add value and hit the Finish button. What I want to do next is to add some activities within hours for each activity. And that activity that I am talking about will be the execution of the ingestion pipeline. So let's add the activities. And from here, I'm simply just going to grab that pipeline, which is the version six of our ingestion. And let's put it there. So let's give this a proper name and the name will be run ingestion. Just like that. What we can't do next is to start supplying the parameters to the ingestion pipeline. So I am going to select Settings. And right here I am going to find a system code, a dataset name, A1 ID, and a snapshot date. So let's start populating the system code. So I am going to select value and select at dynamic content. From here, I am going to get the value of the system code from the variable. And then we hit the Finish button. Now, as for dataset name, this should come from the for each activity. So that's add the dynamic content once again. And it will come from the for each activity like I've just mentioned. So let's select that to get the items. And the item that I am looking for will be dataset name, and it hits that Finished button. Okay, we still need to provide airline IT as well. So let's hit value, add dynamic content, and select the current item. But run ID and let's hit Finish button once again. Okay, we are going to need a snapshot as our last parameter. So let's select value, add dynamic content. Let's get the item that snap shot a date like that, and hit Finish button. Alright, so once the entire batch run has completed, what we want to do is to call a stored procedure that's going to set a new batch status and initialize the next run. Basically, we needed to set a new snapshot date if the run went through successfully or flag it as a restart run in the event of a failure. Furthermore, if the run is successful, it should generate a new batch altogether. So let's add a new stored procedure activity and we are going to get that, and, uh, general, the scrap that there. In fact, what we need to do is to go back to the main converse and then add these stored procedure right over there. Okay, I wanted to attach the foreach activity to the stored procedure. So let's give her the stored procedure a name and let's call it set. Batch status. And this seems just fine. All right, so what I need to do next is to go to settings and I need to apply the right link service, which is the Data Engineering 1. So select that. And let's select the stored procedure which is set batch status. From here, we can import the stored procedure parameters. And now what we can do is to start adding some values. So let's set the frequency first. Cell center value. Add dynamic content. And let's set the frequency. Once more. Let's set up the system code, add dynamic content, and select the value from the system code and hit that Finished button. Okay, so let's start by validating the pipeline. So hit validate all. And as you can see, there aren't any errors. So that's hit that close button. Okay, what I want to do is choose died publishing the pipelines or hit the Publish button and click Publish. So after the publishing has completed, we can start testing our pipeline or I. So let me return back to my main canvas right here, and click Output. And let's run a debug. And from here we can start noticing the progress. Okay, So it looks like the process has started. And what you could do behind the scenes is to open us your debtors to Jew, and you can watch the progress, right? Or from the background. What I can do here now is to go back to Azure Data Studio. Let me minimize this and make it a whole lot smaller. Then they put it right here. And what I wanted to do is to select tables. And let's do the veteran, so select top 10000. And what I'm expecting here is that the batch one should be in progress as expected. Okay, so let me move this under the way and the run will continue. Okay, So we hit the refresh button again to see where we are. Okay. It looks like it has completed the ingestion after the first pipeline. What you cannot do is to go back to the US, your Data Studio again, let me make that bigger instead, let's do a run and it's still in progress, which is just fine. But perhaps we can vegetatively first pipeline log. So right-click on pipeline log, select top 100. And let's see. And as you can see, we have inserted a number of values from our source to target activities. Alright, cool. It looks like our batch run has completed successfully. Once more, you can navigate to the tab. We have ran the batch one query. This query that again. And as you can see, we have a finished our batch run. Since we got a status of completed successfully. Now, I'd like you to take notice a few things. It has generated a new run ID and also a snapshot date. And a status of Nord started. We learned for the 13th and then you stop short date is the 14th. Okay, So we have a finished our development of our batch run orchestration. And this is it for me. And I shall catch you on the next one. Goodbye. 37. Fixing the Duplicating of Batch Ingestions: Hello, good people, and we'll come back in the last lecture. We have completed the development of the orchestration pipeline. But however, I haven't noticed a few problems with the pipeline log. So open as your Data Studio and query the pipeline log. We can do that by selecting tables, navigate to pipeline log, right-click on the table and select top 100 to get the results. What we can see here is that the ingestion process has produced duplicates. And this shouldn't happen. In fact, we should have just seen four entries here. To confirm this, let's also query the source to target view. And I can simply just do a select all from source to target view, which is that one. And the run or the whole, entire SQL statement, basically the two of them. So select Run. As you can see, we should have at least four entries within the pipeline work for a particular batch one and not exactly eight. Let's also take a look at the batch one information as well. So let me do a salad oil from veteran. And let me run the SQL statement. Basically all three of them at the same time to get the three results. Now, at any given point, there are two batch runs that are generated for each dataset. The sales dataset needs to ingest or just one file, which is online sales. And the sales rep dataset needs to ingest the three files, as you can see on screen. So the conclusion is that the two batch run should only ever four entries in the pipeline log and not eight. So let's fix this together. The first culprit here is the stored procedure, which is a good source to target info as it doesn't seem to filter by the dataset name. So let's open up the script to create the stored procedure, which is Create talk that SQL had to do that we can do a File and Open File and let's select, Create, stored proxy dot SQL and click open just like that. Now, I would like you to take notice of the stored procedure, which is a source to talk at info. As you can see, we need to get a dataset name as a parameter as well to do the filtration. So let's add the parameter dataset. So add data set. And it should be a varchar 50. Just like that. And now we need to apply the filter to the SQL statement, which is right here. So just beneath the where clause of stages equal to stage, Let's add a source dataset name, which is that one is assigned to that dataset, which seems to be correct. Now save the script by selecting File and then Save just like that. Now what I want to do is to copy this whole, entire store procedure. So let's hold the left mouse button and scroll down till we find the goal at the bottom and right-click and copy the whole entire stored procedure script like that. Now what I want to do is to right-click on the server name and then a new query. And I want to paste this here. But firstly, I wanted to drop all the stored procedure. So right on top I'm just going to do a drop. Proc DIYbio dot source to target info. Actually get source to target info and highlight that. And run this script only to drop the script. Okay, now it's time to recreate the script. So what I can do is to just select Run to create a stored procedure. Okay, after creating the stored procedure, we can't do one final confirmation. So let's go to programmability, stored procedures, right-click and script as creative. And as you can see, we have successfully added the parameter as the dataset and also applied it as a filter, which is right here. Okay? Now it's time to fix or the ingestion pipeline, which is the second, help prepare. But first, let's clean out to the pipeline log. What we can't do is to right-click again and select new query. And let's do a delete from pipeline log and hit the Run button just like that. So we can start from scratch. Now let's open as your data factory right here, and it's dedicated to the PL data ingestion JSON version six of our pipeline. And the culprit should be here under the lookup activity. So selected the lookup activity. And we need to now supply a dataset name. What we can't do is to just do a refresh here. And important the parameters again. Okay, so now dataset actually appears, but we need to now reapply the parameters are yet again unfortunately. So let's supply dataset at dynamic content. And let's give him the dataset name just like that. And now we're going to need a source system code. And that should come from the system code parameter. And lastly, we are going to need a stage, and that will come from a theoretical stage and hit Finish button. So let's validate and say that, Okay, select, publish and publish yet again. So what we can do now is to start testing this. So let me navigate to the orchestration pipeline. And let's do a debug once again, just like that to see if we have a fixed the problem. Okay, So it looks like we have a finished running or the orchestration pipeline successfully. So let's head back to Azure Data Studio. What I want to do here again is to query the pipeline luck. So right-click on the pipeline lock and select top 10000. And as you can see, we have a fixed the problem of duplicating ingestion. All right. I just wanted to stress to you is that when building a framework from scratch, you are going to encounter problems. And doing this kind of a fixed together is meant for you to get a clear idea of where to find the problems and ultimately how you're going to fix issues. I hope this lesson was a valuable and especially when you need to extend this framework yourself at your workplace to cater for your own requirements. So this is it for me. And I shall catch you on the next one. Goodbye. 38. Understanding the PipelineLog BatchRun Batch and SourceToTargetView: Hello, good people and welcome back. Let's take a closer look at the pipeline log table that is used to track the one of the pipeline. If you feel like you have a thorough understanding of the relationship between the pipeline log, the batch table, the batch table, and also the source to target view. I suggest that you skip this lecture, as it might feel to you like repetition. The motivation behind the pipeline log table is to have a clear audit of what has been ingested into the data lake. So please Open Data Studio and query the pipeline log tables or right-click and select top 100. What we can see here is that the table consists of a run ID that is used to keep track of which batch of this ingestion pipeline actually belongs to. Each batch one has the responsibility to run and execute one or more ingestion pipelines. If we take a look at the entity relationship diagram, you can immediately tell that there is no foreign key relationship that has been applied to 180 are kings or the batch run table. So it's not physically pipeline log is basically a standalone table. But it doesn't mean there is no relationship between the two tables because one AD actually comes from batch run. The reason why it has a standalone table is that the table is meant to expect and posters multiple, add concurrent transactions at one go. And in order for smooth and efficient writing to the table, we don't want the database to spend the time validating the run ID, our case, the batch one table. So let's query the batch run table. Again, the run ID that is linked to the most recent furnished run. So to do that, it can just simply type select all from TPO batch run, where Ryan AT is equal to. Let's select the first one right there. So right-click and hit Copy and paste. So let me run these two segments again, and I will select one. We can also tell it that this was the first instance after the batch one. By looking at the value of one of the one number. If this was a restart of the batch 1 for the exact same snapshot date, then this will be represented by a one number off to the run number of two will also be reflected within the run ID as well to indicate a restart. And you should also find it out of the restart indicator will also be set to 22 to indicate a restart. Other important information that we are going to need is to define when the batch one has started and also when it has completed its execution. And that is reflected by the start and end time. With this information, we can use it to calculate the duration or for the entire batch run. We can then build a business intelligence reports that informed the business are King's service level agreements. That indicates how long a bad should actually 14. So we can observe on whether the batch one is meeting its service level agreements or not. In a typical business environment, batch runs are scheduled to run at least once a day unless, if there is a restart. The ingestion of data is meant to capture the snapshot of how the data actually looked like on a particular day. So this patch writing is meant to capture the picture of the date as pair of the 15th of July 2021. As indicated by these snapshots. As you can probably tell, the run ID is also made up of a snapshot date as well. Okay, let's put focus on are the batch one that has not started off, perhaps a status of noise started. So what I can do now is just copy that piece of there and let's run everything to get the entire information of the patch one. Okay, so let's take a closer look at the veteran with a status off not started. It actually indicates that the veteran for the 16th of July 2020 one has not started and ex explained previously. The status has a number of statuses which include In Progress basically telling you that the veteran is currently in progress. We also have a completed successfully status which is fled when the Ryan has completed with no problems. And lastly, we have a field status in the event of a pipeline. Now, the current indicates a field is used to indicate that this is the most latest run. As a value of one. A previous run will hold a value of 0 and a still run will hold a value of negative one. All right, you can probably tell the veteran table is a child of the batch table. So let's observe the batch table. And to do that, let's do a select all from batch where ID is assigned to one. Just like that. And let's run the queries again. So the PECC run with the ID of one belongs to a system with system code of o ws, and it's scheduled to start at one AM. We can use the shadow startTime information to compare it against the actual datetime. And the result is referred to as latency. Latency just means how liter the batch run started if you compare it against the actual startTime. We also have the frequency which indicates how frequently the bat should run. The frequency can be daily, weekly, and also monthly. So let's take a closer look at their results from the pipeline log. Again, the pipeline log has a pipeline ID. This pipeline, IT will be uniquely given at a generic term by Azure Data Factory. And we can use it to trace a log files are associated with this particular run. The pipeline name indicated is given by us, the developer, but however, each run is linked to a source to target ID. So that's observe the source to target view once again. So what I can do is to create a new query. So right-click you query and let me pick the source to target ID. In fact, I am going to pick a 41. I can't do from here is to select all from the source to target view, where the source to target ID is assigned a four. And let's hit that run button. So this query tells me that the pipeline is ingesting the cell reference data and more specifically, the promotion type file into the data lake at a specified directory. And we can get the value of the directory right here, on the right, under the target or relative path, a schema column. Okay, So now looking back at the values are for the pipeline log, it also has a status to determine whether the Run was a success or failure. It also has a start time and end time and also an indicator of a snapshot date. Alright, so let's add it here and please take your time to understand the source to target view and basically the relationships of these tables on how they fit together. And this is it for me and I shall catch you on the next one. Could buy. 39. Understanding the Get Batch Stored Procedure: Hello good people and welcome back. For this lesson, we are going to have a discussion around how the get batch stored procedure actually works. The get better stored procedure has at least one function, which is to initialize a new batch and have it ready for execution by the ingestion pipelines. However, it is not as simple as it seems. So let's open the gate, a batch stored procedure, and let's observe what it does. So I'd like you to navigate a 2D programmability folder and select stored procedures and select Git Bash, right-click and script as create. The stored procedure actually accepts two parameters, and the first being a system code and the other a frequency of the batch run when it is triggered by the orchestration pipeline. So once the stored procedure has received the parameters, the first job is to acquire the correct time zone in order to set the correct time throughout the process. The next step is to set the current timestamp variable, which will be used to set the current time of execution. Before we describe or the next step, the Get Back stored procedure only looks at batch runs with a state says that is set to not started at. The batch must be in an active state. So the next step is to find a batch runs with a status of Nord started at. The snapshot must be less than the current day. A common language that is used in data engineering when it comes to veterans. It is to run for t minus one day. Basically, for the current day, we are running for a snapshot of events that occurred in the previous day. So in essence, we are capturing a picture of a system from the previous day. So once we get the batch run that needs the criteria of not started under the previous day snapshot. The process then set the status to in progress. And also a startTime with the current timestamp variable. And also the same for the update add column. And so just like that, the next batch is ready. Okay, so what about Apache runs that ended in failure and in need of a restart? Well, that is precisely the next step. It is to handle restarts from veterans with a status of completed with errors. So the stored procedure will search for the ones that completed with a status of completed with errors, with a current indicator set to one, which signals the latest completed run. And then the process flags these runs as in progress. It also sets the restart indicator to two. Furthermore, the process also increments the run number with one to show the number of times that same batch has a be restarted. This new patch is then inserted into the patch one table and then is returned as part of the next batch. Now that the process has created new records as a restart batch from a set that has a status of completed with errors. Now the process that has created a new record as a restart batch from a set that has a status of completed with errors. The next step is to the previous special runs. What a current indicator of 0 that finished with a status of completed with errors. So therefore, these records will no longer be considered in the determination of a new batch in the next new cycle. The last step is simply to return the new batch records to the choline orchestration pipeline with a status of In Progress. I would like to advise you to take the time to study the code to foster an understanding of what the code is doing. I can only explain to a certain extent. The rest is on you finding time and effort to seek understanding. And you may just come up with some significant improvements. This is this for me and a sharp catch you on the next one. Goodbye. 40. Understanding Set Batch Status and GetRunID: Hello, Could people and welcome back. It's time to explain yet another store procedure. And this time around, it's the set of bad status stored procedure that gets executed as the last process straight after the ingestion has completed. This time around, I am not going to go through every line of code like I tried to do in the previous lectures. I'm going to give you a gist of what the stored procedure actually does. The rest is on you to read the code, make notes, and formulate an understanding of what the code actually does. I would like you to open us your Data Studio and navigated to programmability and then stored procedures. And within the stored procedures folder, tried to find sets bed status, which is right here, right-click and then Script as create. So you can actually go through the code by yourself. The idea behind the set of batch status stored procedure is to set the correct status of a batch depending on how the pipeline execution has finished. The stored procedure will also receive the system code and frequency as parameters. All the pipelines have completed successfully, then the stored procedure will set the status to complete it successfully. And also generate a new batch with a new run ID, a snapshot date, and a status of naught started. However, if there is at least one pipeline failure, the status will be set to complete it with errors. Fetch ones that have completed successfully. They will be set with a current indicator of 0. And it's only the batch runs with an indicator of 0 will be used to determine a new batch run. While previous runs with a status of completed successfully won't be sit with an indicator of negative one to indicate it's an archived run and will no longer be considered for the determination of a new batch. Okay, in the last couple of lectures, I have shown you how to go through the code. Please do the exact same thing and go over the code and derive an understanding of what's going on. Now, let's move on to the run ID. So I am going to right click right here on getline ID and script as create. So the peck status stored procedure has the mandate to generate a new batch 1 and needs to get a new one ID. The run ID stored procedure is called by the set of bad status stored procedure. And get run id receives at least four parameters, a newly generated snapshot date, a pet id, and a red number. And then the stored procedure get when IT will use those values to stitch together a string that formulates a run ID. So once again, please study the code, find ways to improve it wherever you can. And we are actually done with explaining stored procedures. Does it for me and I shall catch you on the next one. Goodbye. 41. Setting Up an Azure DevOps Git Repository: How could people and welcome back. All right, it's time to do some finishing touches to our metadata driven pipelines. But before we do that, let's set a Git repository within us, your DevOps. And that's where we are coming to store our code to make sure we do not lose it, and also maintain several versions of it. So I would like you to visit a dev dot azure.com. And within this page, I would like you to select Start free. Alright, since I have already signed into the Azure portal with my Active Directory account when working with Azure Data Factory, it appears to me that as your DevOps locked my user in automatically. If this doesn't happen to you, please sign in with your Active Directory account and not your account that manages your subscription. Okay, So from this particular page, what we can't do is to just simply just select, continue. All right, so what do we need to do here now is to provide a unique organization. So my pretend organization is by a lot, which sell stuff online. Now, as your DevOps needs to be hosted in a region that is closest to you. So I am not in the central US, somewhere in South Africa. So I'm going to select a host that is closest to me, since there's nothing in Africa, and it looks like the closest thing to me is Australia East. All right, so now I need to provide characters I see on screen as part of the security. So mine is as x, g, h, just like that. And I'm going to select, Continue, and it's taking me to my organization. All right, so it looks like this name is reserved. So let me try something else. So let's say by a lot online store. And Australia, east of fine, let me enter another security check which is d h, d m, M, y dy, and select Continue. I hope this one is not taken. So the rule of thumb via just unsure to provide a unique organization, it can be anything that you can possibly think of. All right, so from here, I will need to provide a project name now that we have an organization. So let's give this a project name. Let's call it by a lot. Data engineering. And click Create, Project. This rather live, this private at least for now. And let's click Create. All right, Now that we have our data engineering project as part of the organization, what we can do now is to create a repository where we are going to store our code. So I would like to navigate repositories. All right, so from here, what I wanted to is to initialize a new repository. So I have selected, or perhaps it has been done automatically to add a README file. I can also choose to not add a gitignore file at this particular stage in time. So I am going to leave that as none. And I will hit initialized to start creating the repository. I also wanted to upload my metadata SQL scripts into the repository. But first, I need to create a folder to hold these scripts. Now to do that, what I can't do is select this particular buttons right here, go to New and select folder. Now I want to haul my folder SQL scripts like that. Now I need to initialize this repository or perhaps a folder with some sort of a file because it won't create without one. So let's insert a new file name. What we can do here, just call it, Read Me dot TXT and hit Create button. All right, so from here now I wanted to start uploading my scripts. Now within SQL scripts, I wanted to upload my files. But before I do that, let me just click stay here. And within the read me to TXT, let me just insert. This folder. Contains metadata, SQL scripts, and it's hits the comet. Just like that. And I think the comments is just fine. The main branch is made, which is also fine. Let me do a commit. Okay, so let's upload our SQL scripts. And now let me select the SQL scripts folder once again. And those three dots and upload files. Now I want to click Browse because I wanted to upload my files. And now I need to navigate way. I have done noted my files which should be and the Meta data SQL scripts. Within SQL scripts, I'm just going to grab everything here by selecting all Control a to select all fit open. And the comments for me is to find right here. And we are going to commit it to the main branch and just hit Commit. And just like that. All right, cool. Now that we have our repository in place, we can now seek to make some final changes onto our Data Factory and then publish our objects into this repository. This is it for me and I shall catch you on the next one. Goodbye. 42. Publishing the Azure Data Factory Pipelines to Azure: Hello, good people and welcome back. In the last lesson, we have managed to set up our repository hosted by Azure, DevOps. And for the session we will look at to clean up our Data Factory and then publish the code to us, your DevOps. So I would like you to open your Data Factory and then select author. So before we start publishing our Data Factory to the ACI or DevOps Git repository. Let's do a little cleaner. We have a number of ingestion Data Factory pipelines that we no longer need. We created a number of other pipelines as a way of keeping versions as we build more features and we actually no longer need them. So let's remove these pipelines and we are just going to keep the orchestration pipeline and Version 6 off the ingestion pipeline. Alright, so I'm going to select the first pipeline, which is this one. And what I'm going to select the three dots and just do a delete and hit the Delete button again. So let me select that one again. And just like that version three, and also delete that one. Now I'm doing and I'm hitting the Delete button. And lastly, I need to delete version 5, which I will no longer need. Okay, From here what I wanted to do, but it all just to check if everything is still fine. And I'm going to hit the close button. Now, what I want to do is to hit the Publish button to start at the pipelines. So let me hit the Publish button. Okay, it looks like we have deleted our pipelines successfully. Okay, so the last thing that we need to do is to rename this particular pipeline. And let me just do that by selecting the properties. And I just want to take away the underscore V6. And it should be just PIR underscore underscore ingestion. And it should be fine. And yeah, that's it. Now, what I can do next is to publish and hit the Publish button to publish these new changes. Alright, now it's time to push this as your Data Factory to be a DevOps repository. Now to do that, on my top left hand corner, there's this Data Factory drop-down list, which is right here. So select that and select Set up code repository. Okay? Now we need to configure the repository. What we can do here is you select a type and ours is hosted by Azure DevOps Git. From here, it will set up the Azure Active Directory automatically. And all you need to do is hit the Continue button. All right, So from here we could find the Azure DevOps organization name. So let me just hit that drop-down button. Which is by a lot online store, which is correct. And now it's going to load our project name. And let's select the buyer lot of data engineering coordinate, which is fine. And we'll need to select a repository, which is also by an OT data engineering by the way, which is also fine. Okay, Let's start with the collaboration branch, which should be made just like that. The Publish branch set to eight year underscore. Publish is also fine and at the root folder is also fine as it is, alright? As you can see here, we have at least two branches, the main branch and the collaboration branch. In a nutshell, touching is a form of source code management. The main punch holes code, which is intended for production deployment. So when you are developing new or modifying existing code, you must create your own branch and work from there. And this could be your collaboration branch and its intended that you do not work directly off of the main branch. So once you are satisfied with your changes, you can push the new and change objects TO your own branch, to the main branch. So since we haven't created a branch, we can use the main branch. Please note that this is not good practice and it's always advisable to work from an isolated branch. The ADF published branch, on the other hand, is used to manage releases. So when you publish code from you branch to the main branch, you can create a release within Azure DevOps, and it will use this branch to publish the code. All right, so let's go right ahead and click the Apply button. Okay, Cool. So we are going to set the working branch as made, which is just fine. I have just explained that we haven't created an isolated branch, so we are going to leave the main branch as it is and select, Save. All right, at this stage, your code hasn't been published yet. To us, your DevOps. In order to publish the code, we would need to save that data and then publish. All right, so what I can do is to do one last verification, just make sure everything's still fine. And since we really got nothing to save, we can just hit the Publish button and it should publish your changes as your DevOps. All right, so let's navigate back to as your DevOps. All right, so once you have loaded your Azure DevOps, you can return back to your project, which is biologic data engineering. Select that, and then head over to repositories. And this is a repository that we will start confirming our files. All right, as you can see, our files have been published, as you can tell by these new folders that have been created. I would like you to understand something is that data factory stores its objects in JSON format. Now to confirm this, we can go inside datasets that we have created. And as you can see here, this is a bunch of JSON files. For instance, the e-mail addresses that we have created, the JSON files that we want it to load from four by a lot. And also the folders that were related to the data engineering data lake. And if we navigate inside at least one of them, he should get an indication of what's going on here if you read through the adjacent file, just like that. So we've got datasets and we got the actual Data Factory, which is right here, also in chastened file, as you can see, the name and also the location and some additional data that's required here. And also we've got the linked services, which all three of them that we used are also here. We've also got the pipelines, which are the utility pipelines plus so the orchestration and the data ingestion pipeline as it is. And finally, we've got our scripts, which we have uploaded a lesson ago. And that brings us to the end of this lesson. And this is it for me, and I shall catch you on the next one. Goodbye. 43. Closing Remarks - Metadata Driven Ingestion: Hello good people and welcome back. It has taken us over the lectures to get to this point. But finally, we have completed something that we can call it version one of our data ingestion platform. This ingestion platform that we have developed together is very generic. I want to assure you that it is a very hormone development pattern to start off in a very organic way and then gradually produce a more specific version that is tailored for your own environment and requirements. The key to the development of any complex system to start off in the most simple way possible. If we take a look at the table data object Meta data, and more specifically, the data column relative path schema. When I was designing the data model for the metadata ingestion framework, my first thought was to design for the ingestion of multiple database tables. But then I had a change of heart. It seemed to me it was the more complex route, having to deal with databases, tables, and schemas. And then I decided to choose files instead. As a more simple wait to start. So the schema from the column relative path schema is designed to store also a table schema and the table name stored within the technical name column. So you can already see you can now proceed with version to offer the ingestion platform and shape it to support database tables as well, or even other file types as well. Instead of just jason, look at the data object type column, we had no use for it. But you could immediately tell that we could use the column to store other file types such as CSV, K, and XML. Each file type can possibly influence the flow of the ingesting framework. And it's back to the drawing board. Now that you have gained so much knowledge on using Data Factory in a dynamic way. Think of a way how you can adopt and shape with this framework for your own current workplace. Or ever designed a new framework altogether using the concepts that you have learned. My name is Danny, and some know me as David. Our catch you on the next one. Goodbye.