Transcripts
1. Welcome and Prerequisite: hello and welcome to this session on a George Data factory. My name is the Shang Good. And I'm going to be a guy as the explorer, a George it a factory and some off other services that we will use to build data factory solution. We will be discussing what is a Jew did a factory, its key component labeling services, activities or by plane, how it works, its advantages and how you should do and monitor workflow. We will also go through demo to use data factory in which we will move and transform data from a George Data warehouse to data leg gentle storage. So basically developed very workflow that will auto me to take data from particular source , make some transformation, and then we'll store the output some other stories, location from where it can be picked up for for the analysis. In order to complete this course, I will make sure that you get some hands on activities. I'm not just going to lecturing you, but you will also get a chance to try this for yourself. And to do that, all you need is a Joseph Fiction. And the joy subscription could be one you already have or you could sign up for a free trial. Also, it is good to be familiar with people concepts. By the end of this course, you will have a solid foundation on a George it affect tree. So let's begin over, general.
2. Data Life Cycle: Let's that snow The little life cycle. In the life of America, there are many seven stages. Data can come from many internal and action and sources. It can be conviction data from applications or maybe log data from system for nearly continuous stream off data from devices like I think in the neck Off Things and all these different sources produced same kind of data, but in different comments, because they may have the different based off communication. So the point is in good data is saying, but form it really different. The first a form. We need some kind of stories. Throw the life second off data where we can store data in different stages off life cycle, then many something to connect and grabbed the data from different sources. It's a basin collection. Then he might need to prepare the data. If it take about getting data from different sources, all systems, they might have slightly different form. It's because sources miss stole the data differently. For example, same column may have different permit in different sources. Some of them may have on the stream. Some of them we have the number some allowed the metal values and some may also allow the blank will lose. So we have a clean the data before we can really start to ingested into the actual because stories we have to do the profession or some initial transformation after initial transformation, convert data into consistent former hey actually existed into the storage. Then think about processing when the process we actually transform the data and transformation can mean different things. But ultimately we happened, transformed this road data into a structure and normalize second. Bigger in a way that it means the former's requirement off and system, because these and systems are really going to do the analysis, all interpretation. So there are many people stages that the data we go through and some of these may even cycles who let me go to processing and transformation multiple times. In fact, into this world, very often we don't yet know what are the business questions we are going toe ask in the future and what data we need to answer those questions. So the idea is to keep all the data so that we can come back later and unit as acquired. We're constantly analyzing the later we're learning from the vicar, the continous retune of a model and further make changes in transformation. So this is a in general process, but really the key part What we're doing here is we're getting data from source those that senior and finally sinking it into the destination and all those bits in between unable that happen in a method and the moment that useful for that same so that we can run that analysis of in shape so that we can get the intelligence that my business actually lead.
3. Azure Solutions: in this lesson, I want to introduce the edge of solutions that our family technologies we lived in its around the control flow and the data flow, whether control slow. We had a job data factory. This provides this, that orchestration and she doing. Then we have the actual data flow solutions. These are adjudicator breaks over the edge of actually inside or any of the services all functions to possess. The later at least the control flow is the orchestration that she doing off activities. A later floor is the flow off the actual later within the activity when cold flow should be made up, at least one activity, for example. Let's say if you want to take block breaker and send it to literally now, this is one data flow. Now, within this activity, they are going to be distinct steps potentially. For example, we have to get data from block. This is step one and then a vital data only that will be steptoe. And in between this step, they're likely going to be mapping to take certain values from the blob and met them over to certain files within literally. Similarly, there, near the other activities Maybe there's a transformation data flow that takes the data from the data Lee. And that's a transformation, for example, maybe social joints for the data but older or merges with other things it can do. But the key point is it's transforming the raw data in performer and the structure that is further usable by the end analysis solution. And that's a successful, which is represented by a green arrow. Then they also be failure flow, which is represented by the Red Arrow. So if there is some failure within the previous activity, then the orchestration or the control flow would now moves to call it a senior FBI. So basically we can. We're on this to have a complete flow off the various activities that will complete the end to end life cycle off the data. No, the orchestration poor control flow is responsible for calling those beauties that each hand a data flow within them. Then there are different types of data flow. For example, rendering data pose. This typically integrates with things like power equity, and this is all about data profession, adding columns, splitting columns or maybe taking a full no name and changing it will first name and last name, then they are nothing did oppose. For example, swords join Marge's insert or the aggregate Look up all this and all similar transformations. So we used the different transformation at different basis off like, second, those wrangling data flows near the start as it prepared the data in the letter. And then I think about the mapping data flows who actually transform the leader, get it into a shape that we need for that analysis.
4. What is Azure Data Factory: you will become a factory is a managed data indignation service that lives in the agile cloud. A George it affect tree is like a society is all agent job implode Except that a Judit affect tree There's much store any data by itself or it does not have any transformation by itself. If you have ever worked on S s eyes or in committee girl, all other ideal tools. They had their own transformation functions, but idea simply connect with other sources Destination and transmission services sources and destination connection could be in on premises or in clothes and it connected various other transformation functions components All the services off cloud like her do use equal , high or pig all other services like that and get the required work done. So what does it mean? A job provides a lot off technologies to help you drive value from your data. For example, there is a jury blob storage that provides cheaper cloud stories. You store your data. In the previous session, we discussed a data leak analytics, which we use for massaging and transforming your data. And there's a jewel secret of your house that provides scalable relational data warehousing that you can use toe, understand your business, and there are many other adjust services available for different purpose. You can mix and match these Ajo services as needed to analyze both the structure and unstructured data. But there isn't important aspect off Data Analytics that none of them addressed, which is data integration. So Data Indication means extracting data from source, then doing some transformation and then loading it into a data warehouse for data analysis . And when each of these tasks can be done meant mentally, it makes more sense to automate them and a Judita factory service help you automate this task. Think about Data Factory. Is it conducted in an orchestra in an orchestra? Conductor does not play the music, but he leaves the group off musicians who produced music at various stages Off Symphony. The conductor has a big picture off the entire simply, and it's a way off The music big perform, but the actual music is performed by indigent musicians. Similarly, the date affected will not perform the actual work required to transform the data, but will instruct other services like data breaks or datalink analytics, to prepare and transform the data so it would be Datalink analytics or data breaks that performs the actual work, not the data factory data factory, Middle E orchestrate or overseas the execution off work. So a jury data factory is a service in a job that you can use to take your big data workflow and capsule it that in something called a pipeline. And that pipeline includes all the different activities that are required, toe copy and process the data and get it into the location where you need it. And you can should do those activity so that your by planned events on if treatment, recurring basis when you have to repeatedly do the same best transformation to your data on regular. This is data factory isn't lt or extract load and transformation tools. So how is it different from a deal? It s a size. You expect the later transformed data with building societies transformations and load the data in the target There is in the data factory. You just extract the data from the source and load it into the target. And the transformation is done in the Target data store
5. Key components of Data Factory: Justin, How have you both? Let's get familiar with some of the important idiot components. These components work together to provide the platform on which you can design the work close with steps to move and transform the data link services. Lose services are nothing but simply a connection stream, which defined a connection information that's needed for data factory to connect to external sources. This may include a severally database name. Find folder credentials. UTC. Depending on the nature of the job, each data flow may have one or more link services, for example, endured italic stories. Links surveys specified a connection strain to connect to the age old data. Lick students account at the time was recording. The school's ideas can connect to more than 80 different data sources. Basically, it has a Capital T to connect almost all possible data sources. Let us it represents the structure of data. It simply pointed toe a difference. That data you want to use in your activities, as in all output data says, can contain a table name. Finally, data structure, a sector, activities of the present and action to perform on your data. The processing step in the pipeline this could be data movement transformations or control flow actions activity configurations contained said things like database query, store, precision name. It does state locations. It sector and activity can take Jiro or more input data says, and produce one or more or put data sets. Idiot activities could be compared. Toa assess eyes data flow task opulence like aggregate script component. Except, for example, you might use a copy activity to copy data from one data store, another data store. Similarly, you might use the high activity, which presents the high already on the edge or edges inside cluster who transformed all analyzer deeper yet effect. We supposed three types of activities. Get a woman activity, deter town submission activity and control activities. My plan is a logical Brooklyn off FDP's performing a set of processes such as extracting data, transforming it and loading it into some database data warehouse or some kind of stories. For example, if I plan can contain a group of activities in just data come amazing Astri and then then a spark a query within Azure Gate Oblates toaster. To partition the data. The data factory can have one or more biplanes and each by plan would contain one or more activities. Using pipelines makes it much easier to should do and monitor people logically related activities. The benefit off this is that the pipeline and allows you to manage the activities as a set insure off managing each one individually. The people peas in a pipeline can be changed together to open it sequentially. Well, they can't operate independently in parallel. For example, if my plan could contain a group of activities that copy data from local, this could literally and then then use equal Kredi could transform that data and finally load the data into a data warehouse. Figures represent the unit off processing that little minds when a pipe land exhibition needed to be kicked off because they represent she doing configuration for pipelines. They contain configuration settings. Let's start and end. Did that situation, if you can see figures are not military part off Abia implementation, they are required only if you need by plans to be executed automatically or on is some really should do
6. Pipeline workflow - 4 steps: I think a Jordache of that tree typically performs the calling. Four steps the net and collect. Most step is to connect all the required various sources off data and move the data as needed to a centralized location was such secret processing. But data factory. You can use the copy activity in a data by playing toe data from book on premises and cloud source later store with centralisation data stored in the cloud Corker that Mrs, for example, you can collect data in Ajo Datalink stories and transform the data later by using a George Vitelic Analytics completing service. You can also collect data in a Jordan block stories and transform it little by using an agile actually inside her pastor. 2nd 5 is transform and enrich. Once data is present in a centralized data stored in the cloud, it is transformed using competent services such as usual inside do data lit analytics and machine learning. Remember, Data Factory is Justin Orchestra and it cannot perform any transformation activity by itself. Publish after the Lord data has been transformed into a meaningful reform. Load the data into different plots, stories like a George it of their house. Perkins option by FBI and analytics tools and other application wantedto. After successfully deployment off pipeline, you can oneto the Xun abilities and pipeline for success and failure. We can monitor a job in a factory pipelines using a job monitor AP I publish L as your monitor logs and other health panel in the edge of portal.
7. Demo Introduction: Welcome to this module. In this more duel, we will do some hands on activity. First, we will create at your data storage gentle account and will use a George Storage Explorer toe access date ngentle account. Then we will deploy sequel data warehouse and data factory, which is pretty easy and straightforward. Now, after that, we'll go through data factory. You I and we'll understand different menu options. And then we will be using the copy activity off at your data factory. So we are going to look at ways off taking data from their house will be pipeline to move and store the output in some other stories. Location where we can pick it up for for the analysis, they will also take a look at data factory monitoring options. Thank you.
8. Create Azure Data Lake Storage Gen 2 Account: in this demonstration, I'm going to teach you how to create an azure Data Lake storage gentle account. Let's again go to the old services. All you can directly access storage account from here. If you look into the azure portal, there s separately source called Data Lake Storage. Gen One. But you create stories gentle account. We needed to first create stories account because this is a property off the general purpose version two storage account. We will see that in a minute. Microsoft nowadays is focusing on Data Lake storage gentle because gentle is hard. The place off. Jen one. Let's create a stories account. Let's click on that button and let me choose my subscription and my the source group. Let me create new resource Group. Give me RG story. It's Gentoo. Give the storage account name. I'm just putting some number to make it globally. Unique. Use my home reason. Some of the new thing about gentle datalink stories is that you can take advantage off all the stories, account properties and features like performance excess, two years and replication. I'm going to select extended performance general purpose Version two. Locally, the Rendon stories and hold excess here. Next stage will take us toe connectivity matter. I will leave it to default public and point at this point, we'll go next to the advance page. Let's leave it before setting for security as enable block soft lead we can actually want to ignore because the point here is indeed off this story account using the traditional blood service, we're going to turn on the hierarchical name space property by flipping the data next door . It's gentle from disabled to enable. And as you see when you enable Gentoo, it automatically disabled data protection option This all you have to do to make a stories account a data lake storage gentle account again. I didn't need to make a normal stories account data lake storage, gentle account. All you have to do is to enable this option. That's it. The view in create it will not mean take too long. But still I can pause the video until then. All right, deployment completed. And if we now go to the resources, I want to show you a difference from what you are accustom toe, probably working with the general purpose stories. You might have noticed that under service is instead off just a blob service. It's now called data like gentle file system. You show you the defense. I have another general stories account you see in general account. It is called Blob Services here, but because the unable to genital stories it is called data late, gentle file system. So from now on, this story's account is going to be data like stories with each BFS compatible fire systems . So put that point. If we selected surveys, we can create a file system, which I will do it now. I'll call it Gento File System. Okay, And remember what you create is gentle. File system is not a block and dinner. It's an actual folder now and then. If we go into the file system, we get it prompt to use a George Storage Explorer. Ajo Stories Explorer is a free tool from Microsoft that is available on Windows Mac and Limits and, as the mean suggest provides a graphical in environment to browse and perform actions against a George storage accounts. If you don't have it already, please don Lord and install is very easy and straightforward. This tool family uses an account to authenticate to get a job and then shows the subscription That account has. Then the stories accounts in the selected subscription. Under my subscription stories accounts, we can expend my gentle stories account, which is currently identified as a BLS gentle. I would presume that eventually the Storage Explorer thing is going toe green in this node that we're looking at now from block containers, toe file systems, because that's what it should be. And when we select the file system notice now that we can do direct upload all, we can organize a heretical folder space. So I'm gonna call this a new folder off later and will use this folder as part off workflow . Your view may be out of date here. You want to refresh? We're ready to go here.
9. Deploy Azure SQL Data Warehouse: s Create and Joe Sequel Data Warehouse. In addition to Search a sequel Data Warehouse in everything, you can also find it in database as well as analytics. Let's click on sequel, Get other house and add New is Instruments and let me show you how the deployment workflow works. I will create New Resource Group for the data warehouse, and I will also give the Munim to my date of your house. I'm anything just a random imaginary number at the end, just to make the name globally unique. Let's create a new virtual server is going to be a cluster. Let's call it do the blue s 108 It they then easy conventional with a strong password. We get to prepare giving this advice on how strong the pastors do need to be. Let me find my nearest location, the Holy show for access to data warehouses out of our schools. But this big box here will allow all of your services. In other words, Microsoft I P addresses that are associated with those services is to access the server. You may also need to modify the sequel data warehouse firewall, allow your client. I think from wherever you are in the world. In addition to other public I p addresses all writing addresses range. I'll show you all the confusion. It's the data warehouse unit is the compute unit used to a Joe sequel date of your house. Let's select our performance level for production workloads. We should be on gentle and not jen one. And I can skill my system down here 200 because I'm just working in a test demo environment off this live in the real world. Your goal is to Pune your their house in such a way that you're not paying more than you need to. But you are also getting your compute to satisfy your service level agreements. So let's apply here and then go to next. What? Your data source. We can grab a backup or we can do No, actually, what we can do Simple. Which will give us the adventure work date, every house, simple data warehouse. I'll choose that, and it picks up all coalition, which I will select. We'll go toe tags and we won't use tax here will go to next. We see our estimated cost, but are here. You see overtones and then we're really complete. I find that it generally takes the 10 minutes off fever for Microsoft deployed a warehouse well back. Then, once the deployments complete, they can go to the resources they can pause or resume the data warehouse to pause and resume the building. You're still charged for the stories, but at least the computer is not incurring any cost. We look in our setting for fireball. We can come down to fireballs and virtual networks, and this is where I mentioned that you can allow access to adjust services and your auto detects the public I p associated with your client device. And then you can apply I p address ranges Public I T address range is here toe. Allow connectivity into that server. Let's come back to the overview page because we won't that be in this hosting off the data warehouse. Let me copy this to my clipboard and then how you connect to that date every house. Typically, I used the Sequel Server Management Studio, which is available as a separate down lord. Let's do a connection to the sequel server database engine, and for the seven man I will fished in the full qualified man off of a server will choose sequel. Several authentication and I will supply the same credentials, which I give while creating the date of the house. That's gonna Oh, it is showing an error that your client I p address, does not have access to that server. So let's go back to a bit of a house, firing virtual networks and add our local client AP steer and retry connection using management studio so heavy. I know. Remember that sequel? Several data warehouse is the hosted platform in the service. So if you right click the virtual server, there's no properties. I remember when I was surprised when I first saw that there was no properties, but remember, you're dealing with black from Is the service here? We explained databases and then data. Very DW 108 which we created. We have got a list off tables that Microsoft has built into this simple data warehouse. That's very relieved. Wisdom will Now here. Thank you
10. Deploy Data Factory: Let's create our data factory Now. By this time you must be familiar. It is pretty easy and straightforward. You can either search data factory Oh, are selected from database and click ahead. Give it the name. Select subscription and Resource group in subscription. Let's give it simple. RG underscore B A. Okay, I will use ah version two off data factory to take advantage of the latest features. Version two is a new washing off data factory. One of the biggest feature off version two is the integration off S S S and control flow Functionality. Monitoring is also an edit enhancement inversion toe making it integrate with at your monitor. Okay, let me choose the closest match off my location. Now here we have option to integrate source score control. We are assured them ups. I'm not going to do that in this case. And then we'll click, create toe, deploy the factory
11. Getting familiar with Azure Data Factory UI: Let's go into the data factory, and what we have here is just the control plane for the service. And if I hide the essentials pain, we see a link to the author and monitor experience. This is very actually performed of work in a George Data factory. Let's click on this, and if you notice it has open a separate window and browsing toe ADF dot as your dot com, the product that befalls toe this over few page, which has many quick links or shortcuts. We are going to do the copy data Operation Toe copy our table out off your sequel Data warehouse. Later, we will go into more detail. But meanwhile, this page also has helped videos and bunch of tutorials. Let's begin by getting familiar with the U. S. The main menu contains three options Data Factory, which is sort off main screen, which contains a few interesting shortcuts. The author panel is where we create new pipelines and spend most of our time, and finally, the monitor panel there we can monitor by planned execution. All right, in the main screen, we have five options. Let's go through them one by one. Create pipeline is merely a short skirt that takes us to the author panel and creates an empty pipeline automatically. For us. Create by plane from a template opens the template gallery. In there, we can create pipeline from a collection off pretty find templates that can help us get started with distinct scenario. Quickly copy Data Tool provides an interface that optimizes the process off ingestion data into data store. We learn more about the copied it a tool in upcoming lessons. Configure SS Eyes Integration can provisional as your SS iess integration around time and finally we have the set up court repository. It allows us to set up a call repository for your data factory and have an integrated and to end development on release experience. Now let's spend some time learning about author panel when we're creating a new pipeline. This is the panel we spend most of overtime in in the factory resource section. They can find a list off pipelines, data sets and data flows in a data factory. By clicking on the three dots in the pipeline surveys, we can access new meaning new me new that allows us to either create a new pipeline from the ground up or from a template, we find similar many options on both that data sets and data floats services. We will click on this little plus sign here, and this will create a new pipeline. Notice that there's an option for a pipeline from template and over here to the right, you can import arm templates. This can make that pipelines easier to use and reuse over time and again. A. Your data factory is a cold free orchestration tool. So, for instance, from under move and transform, we can simply drag and drop this copy data operation toe the control surface, and then you select this copy operation down below. You will see different fears related toe selected activity in this case for copy data. Since this is cooperator, we have source in other words, Berries your data coming from and then think Verizon data going to and we have other settings and mentally the options. If you click the view source scored all off your data Factory pipelines are written in Jason. The idea off a pipeline is that we can link different activities. We can create an entire workflow hair using these connection link in green, By the way, I can right click this link and choose the lead. It will go away, and if I click this, I can hear the output button. We can add an activity on sore own success, failure, completion or skip so we can choose next activity. If that activity completes, let's choose next activity. If you're not exactly sure where it is, we can do a search here. In this case, let's add a your function again. Just dragon drop. You can see how easy that works and you can validate the work. You can look at the entire pipeline cold then it only should Buell for even as a triggered oppression and so on. As mentioned earlier, we consider up record repository toe, enable continuous integration and delivery off your data factory. And here we have our template me new option. It simply allows us to import and export arm templates with pipeline definitions. The Connection panel host a list off previously created links, services and integration run times, and here is the default or to result integration run time that we saw earlier. And finally, let me point your attention to the trigger section place off pipeline execution triggers. We don't have any trigger at the moment, so this list is empty. I'm going to close all the taps, discarding my changes and go to the next item. Let's take a minute to review the monitor panel and all the features it offers. The dashboard panel contains information regarding pipeline and activity. Execution in the Pipeline Grants panel We can monitor pipeline execution, and it's very useful to make sure the pipeline is actually operating correctly. The Trigger Grants panel display a list off pipeline executions that were triggered automatically. The indication runs time panel display the list off integration run times available to us or the result is that the Ford Integration runtime and finally, in the Alerts and Metric panel, we can create alert rules to monitor their factory pipeline proactively. Now that we are a bit more familiar with the date effect to the U. A. E. Let's create our first pipeline
12. ETL Operation: in this demonstration, I would like to teach you how to use your data factory to perform a copy activity. In this case, we are going to copy data offer table from data warehouse sample database to our data extra . It's gentle account. So let's just come back to the overview page and click on copy data and it kicks off this copy data vagina. Let's give this task and name now. Here we can choose to them task only once, or we can create a reusable pipeline by choosing this option and select a should do. Even if you choose to limit only once, it is not going to be deleted after Willan and finish. We can manually even it as many times in the future we want or create a trigger and so on and so forth. But if you look on the left, we have six steps and it walks us a light through without having to understand anything about the underlying AP A or Security. There's a huge library off connection hair. They are organized into groups, hair like your database file and accept. Let us create a new connection in the new link service lashed in all type data warehouse to filter, and we will select this at your sequel Data Warehouse Link service Object. The link service is just what it sounds like. It's going to be a reusable object that creates and authorized connection to in a set that a set can be a jury source or known a jury source or maybe own premises. Resource. Let me give it some name. The indication runtime is the computer earlier. Let's live it on before setting off for two results. This becomes more often issue in a hybrid scenario where you may have to deploy the rent time on premises that's called the self hosted operation. As faras authentication, we can do, user or give old, but before that first we have to choose how we're going to get into this resource. Let's select the subscription and data warehouse server recreated. The database is the simple date of your house. The authentication type options are service principle, managed identity or sequel. I'm going to go ahead and just choose a sequel authentication using the same account we created while creating data warehouse. There are bunch off optional connection perimeter you can add, which we will not use in this case. We're going to test connection connection. Successful finish. So we have created our source data store and again, this is going to be reusable object that we can use in other pipelines. So let's go next, and we have tapped into the data warehouse and here we see heavy need toe choose a table, which we want to copy so we can select one or more tables. Hair. We want to choose bim product table. So let's do a search for product. So I'll select the stable. And if we bring up the split bar, we can see a preview off the data. And in the schema tab, we can see the columns and the data types. Let's click next. So one more thing, let me go toe previous. Besides just picking existing tables, notice that you can also write a query together specific data that you want to use in the job. But we are just going to grab everything from the table. So again we'll click next. The Destination data store we have not created yet, So let's choose create new connection again, and this time let's do a search for data lick storage. Now remember, there are Gen one and, Ah, Gentoo. We're not concerned with general hair. We have already created a gentle stories account. So let's select that. And once again, we are going to do a selection from our subscription for authentication matter. Once again, we are going to choose account key. We have selected our storage account name. Let's give some name. Also toe this linked service. Okay, and now best connection connection successful. Good deal. So let's click. Finish. So far, so good The click. Next, we are going to tap into our data extra gin toe account and browse toe. Pick out which folder we are looking to. So first with the list off file systems, let's click it and then select the Output Data folder and click Choose. We have to specify an output file name. I simply call it for duct. Don't CSP for maximum conquering connections. Let's put too. Let's click next. Here we can customize the file format. It's ah, deporting two comma separated values. I'm going to leave all off this off the default, but the bottom line is that data factory gives you a lot of flexibility here. Click next fourth. Doren's So what do you want to happen if there's an issue while moving data you want to about the activity as soon as there's an incompatible row? Or do you want to skip the row? Or do you want to escape and loved them so that you can track them later? I'm going to lead the default options here. Also, I will live performance settings to the before at this point and click next. Here's our somebody screen, and when we click on next, it actually goes out and run the pipeline. As you can see, if I click on Monitor Data Factory, it takes us to the moon interview and let's click Refresh and we can see that the street is it succeeded. Good. Let's come back to storage Explorer, come into the output Data folder and sure enough, we see our product dot CSP. Let me right click and download this toe My backstop. One more thing if you have not have not noticed all off our activities in stories Explorer , lest down at the bottom. Okay, let's open excel just to do a quick sanity check. Oh, it seems like I forgot toe pick up the column. Headers. No ovaries begin still at that. What is nice about Data Factory is that we can go back and edit the job properties to include the column header. We can change over Pipeline and green on it. I will do that in a minute. Meanwhile, let's take a quick look at data we can see. There are a little over 500 rose in the data set. Now let's go back the data factory interface. And if you go back to the author experience, here is our pipeline. It's right here so we can make required changes. Let's go to the destination connections and heavy see option. First role as hitter in a billet published changes publish means sale. And then again, we can go to monitor Tab and didn't the job. If required, we can refresh and we see that our job has completed again. Let's go back to storage Explorer. Okay, we see that athletic time. It means new file has over it on the old file. Let's download it. And now open it. Now we see that we have the header rows now. Thank you
13. Permissions: during of a last little operation, we were able to authenticate digital extorted seamlessly. But if you run into a problem with that connectivity, you may need to give data factory explicit permissions toe that data repository. Let's see what we mean by that. Let's go back to the portal hair. The General Porter. Let's go back to our data factory. And if you go to the properties you can see here manage identity object. I'd be in simple words. Managed identity are used by at your services to authenticate other edge or services that support that your Steve Director authentication. The manage identity is registered toe. You're a beauty directory and represents this very specific data factory. We can directly use this manage identity for data lake storage, gentle authentication. It will allow this designated factory toe xs and copy data to and from your data lake storage. Gentle. We can add this manage identity toe role based access control less. So I'm going to copy this into my clipboard and then back to the Storage Explorer. We can give the data factory permission at the file system level as well as the folder level. So to do that we can right click on the file system and go to manage excess in the access control list. Down at the bottom, we can paste in that object I d off the data factory and then give the account whatever degree off excess directly at the file system. And sometimes I found that you have to do it in two places. You have to do it at the file system level as well as off the folder level. So you can right click the folder and do the exact same thing here. We can paste it in the object I d off the Edgell resource. In this particular case, it is did a factory and give it whatever level off excess you won't give. This will help you toe seamlessly connect the data factory to other sources. In this case story intento. Thank you.
14. Data Factory Monitoring: in this lesson. I've wanted to walk through a basic demonstration off how we would perform the monitoring with a George Data factory. So what exactly do I need to monitor family for monitoring? We think about the trigger prince. The activity runs the pipeline runs and then those interrogations. During times they are going to be the focus off our data factory monitoring solution. And there are two ways to monitor monitoring in data factory and monitoring with a job monitor will discuss both. So, firstly, we actually have monitoring within the data factory user interface while developing these pipelines that the bigger can be very useful. This enables me to run my pipeline. I can have break points. I can see exactly what's happening, but that's not really monitoring its apart off developer experience. This debate facility functionality is helpful to see exactly what's happening inside, and that can be very useful as a developer. But the bulk off over monitoring will come through the monitoring interface. It actually starts with a dashboard. This dashboard will show by default the last 24 hours, but I can change that time window is going to show me the pipeline executions. I would see failures. I'm going to see those percentages. I'll also see details about the activities. I will also see data about the triggers. This just gives me a very quick overview off what's happening in the environment. I can see details about any trigger runs if I have triggers. I actually see that detail about those executions and execution may be based on should dealing or maybe even based. But what I really care about our that by plane runs here I can see the name off pipeline. If I have different pipelines in my environment, I conclude this little filter button and then I can change to only see particular pipelines that I have available. I can search based on them. I can see the various actions available for the pipeline. I can view the activity done. I can actually read on it. I can see when it was started, I can see the duration it took. How it was triggered, the status if there was any terra meters or invitations, and once again, those I could filter on is well, you will also see I have an overall freighter available here. I could change some of the details that it's showing me if I only want to see the latest the date I can modify. So if there is one I care about in particular, I can dive into the details. So if I like the little pipeline I can with the play, I can view the activity run. See, I'm going to select that now I can see more off the detail. I can see activity name, type of activity. In this case, it was type of copy. So this overall pipeline is what I'm looking at. Now I can see all the activities that were part off it. In this case, it was just one activity within the pipeline. But I would see every single activity that made up that pipeline, depending on its type. I might see, for example, here the exact imports. I can see the exact output, the amount of data lead, the amount of Gator written durations again, the start, the duration time off, this particular activity within the pipeline. Then I ds where it was running from example. The indication one time in this case, is that the fort and I might also have this little pair of glasses This. Show me the details. So here I can see it was coming from. And your sequel? Data warehouse. It was going toe at your store. It Gentoo, and I can see the exact details. Number follows. Read. Number off. Rose written down. I can see the throughput, the copy tradition and the details. Associate it with it so I can get great inside into the specified around this particular activity. Done. So this is a key type off monitoring we're going to perform. I can see the status off my integration when times here I'm just using the automatic at your run time. If I had additional run times, maybe on premises, maybe those SS I years, I would see those as well. And once again, I can see details about the wrong time, details about the activity, guns that are actually using that integration on time. In my case, it will be all of them that I have in my environment. If I go back up, I can also look at the details. I can see status. It is off type a jur and the reasons going to automatically resolve. And once again, I could go and dive into activities except Tre. So this is the key types off monitoring I am going to do within data factory. This is giving me the insight toe my pipeline runs my trigger runs the activity runs and then the integration. Sometimes now there are metrics as well. When I dive into these, if I was toe click this button metrics, it's now just going to jump me straight back to the azure portal. That's where the metrics are usually selfish. So that leads us to part two off the demonstration. Monitoring with at your monitor. Here I am looking at my age or date of factory. The author and monitor is how I get actually to the into the George it affect to you, X. If I scroll down, then I'll see my metrics. And from here you can see a number of key things we care about failed activity. Dunn's Phil Pipeline runs failed trigger runs, integration one time available memory, superior position, maximum allowed entity count. And for all of these different met critics, you see, I can actually create an alert for them, so that might be very useful. And if I'm seeing a fair number I can put on alert on that. And in case of failure, I can goto data factor ux and see for the details and investigate so I can use thes things toe, get the metrics. I might use that to trigger the alert, which will then drive me to go and dive into the interface and see exactly what's happening . And then we have, ah, our diagnostic settings so I can send all of this data, the metrics and the details about the various activity lands by planned runs or the trigger runs. I can send them toe one off those three destinations, stories, account, even toe or look analytics. And when I sent it to lock and militates, that's where I could customize. Like, How long do I want to keep these logs for? Maybe I want to keep them for certain duration. I can do that through the Log Analytics conflagration. Thank you.