Transcripts
1. SSIS Introduction: Welcome to my course on SQL Server Integration Services. My name is Zach good Ishi in I'll be your instructor throughout this course. What is SSIS? Ssis is an ETL tool which is used to extract data from multiple heterogeneous sources. It has the ability to transform the data. Also, it can load data to multiple different destinations. So what are we learning in the course? So we are going to cover different topics. So this is the GUI interface of SSIS tool. We'll start with this Dataflow section. If you look at this data flow section, on your left, you're going to see multiple different transformations. We are going to cover all major important transformations like aggregate, Merge, Join, multicast, derived column data conversion conditionals split will cover all these different transformations. Then if I'm going to go down, I will see sourcing destination steps. So step is used to extract data from source and destination steps are used to load data to different destinations. So we'll cover this Dataflow section, then we'll cover the control flow section. The control flow section. If you look at here, you're going to see different tasks. These are important tasks that we are going to cover. We have variety of different tasks like FTP, task, script, task, XML tasks send mail to us, which is used to send mail. Then we have the execute SQL task, which we are going to use to write SQL queries. So this is going to be a comprehensive course on SSIS which will cover all important topics. So what are the prerequisites or unique expertise are required in order to take this course. So you should have very basic knowledge of SQL. You don't need to be an expert of a squirrel. Very basic knowledge of any tool because if you know one ETL to learning any other ETL tool is not very difficult. No programming language knowledge is required. The tools that we are going to use Visual Studio 2019 in SQL Server Data Tools. These are the two tools that we are going to use in our course. Now. Who are my target students? Etl developers, those who are working as an ETL developers day can take this course. Those who are looking for a career in the field of data warehousing or BID can also take this course. Professionals who want to learn new ETL tool to enhance their career, Let's look at educational challenges one can face. So if you're learning SSIS tool, you should have good system in place. 64 bit Windows 7 or new virgin should be installed. At least you should have 12 GB hard disk space free. Four GB RAM should be available. Now, let's look at the career opportunities. So this course is going to build your foundation. Once you are done with the course, you can apply to different fields like data warehousing, retail consultant, business intelligence. So this course is going to open multiple avenues for you. So thank you so much and have a great learning.
2. SSIS Components,Advantages and DisAdvantages: Hello everyone. In this particular lecture, we are going to cover SSIS introduction, SSIS components, and then we'll cover the SSIS advantages and disadvantages. So first, we are going to start with SSIS introduction. What is SSIS? Ssis stands for SQL Server Integration Services. It is a dirty integration platform. It has the capabilities of ETL. Etl stands for extract, transform, and load. So using this SSIS tool, we can extract data from multiple different sources. And then SSIS tool has built-in transformations such as aggregation, export and import column, SCD, data conversion lookup, join, etc. So using these different conformations, we can transform complex dataset. Once the data set is transform, we can load data to multiple different destinations. So SSIS tool also performed some additional operations such as sending email, FTP operations. It has web service task, bulk insert task. So we can do a variety of different operations using SSIS tool. So SSIS components, so there are five different components of SSIS. The first one is control flow. It contains containers in tasks. The second one is dataflow. It contains sourcing destination steps. Also, it contains transformation steps. So using this data flow section, we can extract data from different sources. Then we can load data to different destinations. Also, we can use different conformations. Event handler component. Using this event handler component, we can same messages and e-mails. Package explorer component. It offered a single view for all packages. The last one is the parameter. We can use parameters inside a package, outside the package. So this is what we have. Srs components are concerned. Now let's look at the advantages and disadvantages of SSIS tool first. Advantages. Using SSIS tool, we can extract data from heterogeneous sources. By heterogeneous means different sources. Using SSIS tool, we can do variety of different transformations. It is simple, very easy to use. Also, we can parallely load data to multiple different destinations. And let's look at the disadvantages. When multiple packages are executed. It consumes memory and system resources. So you have to take these things into consideration while executing multiple packages. The second disadvantages, integration with other products. So it is very difficult to integrate SSIS with products like Salesforce, workday, etc. So now let's look at the competitors of SSIS tools or Talend, Informatica, IBM Data stage print out these are the competitors. So if you learn one ETL tool, for example, if your SSIS tool, learning an ETL tool is not very difficult. So this course is going to build your foundation. So thank you so much and have a great learning.
3. SSIS Tool Comparison: Hello everyone. In this particular lecture, we're going to draw a comparison between SSIS tool and Informatica power center. Informatica power center is a popular ETL tool available in the market. So we are going to look at how SSIS tool is better or which areas it is better compared to Informatica power center. Now, first of all, let me show you some popular data integration tools that are available in the market. We have talent. Talent is in open source data integration tool, very popular tool. Then we have Informatica. Informatica is costly. It is used by big organizations. Then we have IBM Data stage, which is hosted on a cloud server. Then we have Pentaho data integration tool. Just like telling when tau is an open source data integration tool. Now, let's look at the comparison between SSIS and Informatica power center. Why I'm drawing a comparison between SSIS and Informatica, because Informatica is very popular detailed tool, it is high demand. So that is the reason why I'm drawing a comparison with this particular tool. So SSIS tool, it is a free ETL tool. Whereas when it comes to Informatica power center, it is costly. It has a license fee which you have to be eagerly. So you have to take distinct into concentration. What do I mean by free ETL tool? So by free I mean debt. You can easily download this SSIS tool from Microsoft website and you can install it on your system. Then the second is limited connectors are available to connect with third party applications. So we need connectors to connect with different applications. So when we, when it comes to SSIS tool, it has limited connectors compared to Informatica power center, informatic up our center live number of connectors we have in Informatica to connect with third party application. What are third-party applications? Third party applications, I mean, tails for snowflake Cloud data warehouse assumed Google Cloud desert third party applications, that SQL Server Integration Services is not suitable for global large-scale organizations? Yes. Whereas when it comes to Informatica power center, it is for big enterprises performance-wise. It is Good. Come back with bigger integrations. Performance is affected. When it comes to Informatica power center. It is robust and fast and performance is not affected by bigger integrations. When it comes to reproducibility, you can use a third party vendor which is expensive. So when it comes to Informatica power center reusable feature is great in Informatica powers and two, what do I mean by re-usability? So reusability means that once the process is created, we can use debt particular processing and other process. This is reusability. Whereas when it comes to Informatica powers centered, this feature is great in Informatica weekend. Reuse one component in another component. Whereas when it comes to SSIS, you can use t third party tool for reusability. Okay? Scalability. When it comes to scalability, scalability feature is expensive, but great. Scalability mean. What do I mean by scalability? For example, if you have huge volume of data, then in Informatica power center, what it does, it distributes your data in multiple servers. So this is scalability. The more data the more the so-what, this is scalability. Whereas when it comes to SSI is it tests were piled patron for scalability. So we have the scalability feature in both in SSIS as well as in Informatica. Ssis has huge ecosystem, whereas when it comes to Informatica power center, it is not as huge as societies. So this was the comparison between SSIS and Informatica power center. Hope you understood the concept. Thank you so much and have a great learning.
4. SSIS Installation: Hello everyone. In this particular lecture, we are going to install SSIS on your computer system. In order to install SSIS, you first need to download and install Visual Studio 2019. Community Williams. So in the Google search bar right, download, we used to do 2019. Then you have to click this link. We use to do your dot Microsoft.com slash downloads. So click this link. Now, then I will click this Free Download. Community William free download. Click this, check this installer. Okay, now I'm going to open this installer. Click this install. The yields told you installer click Yes. Then continue. So this is going to take some time to complete. So in the course of the have a youth D, We used to do propellant 17, William. This is the latest, were younger. I am showing you how to install. Okay. No. This is the wool cloth section. Now from here I'm going to select this particular option. After that, I will select this, these two tools installed our location. So I want to install in my directory since I don't have enough space in my C directory, let me just change the directory to E. The yields to deal with either 19 community where you can eat directive fine. So total space required, 12.45. Distill installing in my E directory. Okay, fine. Installed, writing down an ordering. So I have two options. Del Norte all been installed. I want to install by downloading, I will choose this particular option, click Install. So the process had been started. This is going to download the vehicles to their 2019 community where you're in air at the same time. This will install it in my system. Does so you take some time to 0.39 GBs decides. So this is going to take some time. So the installation process has been completed. Next, I'm going to click the Launch button. So the yields Studio 2019 is starting. So next I will click Continue without cord. After that, I will go in the extension section and I will click Manage extensions. Then here on line section I will click the online section. We use Studio Marketplace. No scroll down and select and download this SQL Server Integration Services project. You have to download and install this in your system in the US to deal. This really takes six minutes to complete. So let me check if I have this in my system or not. I will go in the downloads. In my directory. Downloads, I have this in my system. Let me just cancel it. So if you do not have it, you have to download it. Now cancel it. Next I will click this Microsoft Data Tools Integration Services. Click OK. Meanwhile, Okay, click Okay. Then click Next. Install. Okay, please close the following processes before continue the setup. Select me first. Close all the windows. Microsoft, he has to do a half to close all these windows. No, close it. Close all the windows. Now. Now I'm going to install the Microsoft Data Tools Integration Services. Click Next. Install. Close the following processes, okay? Yes. The installation process has been completed. Click clause after deck in this search bar or search Visual Studio 2019. So I'm opening the Visual Studio 2019. No. Create a new project. In this search for templates. In this section I will search integration services. Integration, search integration. In this selective it has been selected now, click next. Project name will be first project, DW edge product. Let me just change a product where I'm going to create a project here in my d delta T e directory source. Select a folder. Okay, create. So this will create your first project in SSIS, creating product Microsoft, we use to deal. Look, the first project has been created and then you just protect, you have this package. From here, you can create multiple packages. This is how you create a new SSIS package. The package one has been created. Kaye, hope you understood the installation process. Thank you so much and have a great learning.
5. SSIS First Transformation: Hello everyone. In this particular lecture, we are going to create our first project in SSI. Yes. So what we are going to do, we're going to load this dataset source underscore product. Let me just open this file. So I'm going to read this dataset in, then we are going to apply some transformations on this data set. So I have Product ID, product name. So product ID, product name should be converted to uppercase country. So in my country field, I have different values like country value. So United States, USA's, I've warned country to be represented in the form of abbreviation for United States. It should be USC, for United Kingdom, which should be UK, and for Australia should be EUS. So we have this 60 feet in the city field. We have some blank values. I want to replace blank values with n slushy. Okay, so let's create our first project in SSAS. So I'm going to go into File and then I'm going to create a new project. I will click this in, then I'm going to name this project. So name of project will be filed two, dB loading. Location is deceased, the location where this project is going to be saved. Okay, click OK. a new project has been created. So in order to read data from this file, from this CSV file I'm going to use. So I'm going to select the dataflow option. Then I'm going to select D flat file source option. So where I'm going to get the Flat File source option. So let me search the Flat File source option. So look flat file source option. So let me just make this connection manager, I have to create a new connection manager knew. Okay, fine, fine limb. I have to provide a part where this file is located. So this file is located here. So I'm going to provide the part here. Browse, select all. I will select this file. Okay, this file is comma delimited file. Columns D's are the columns of this particular file, Product ID, product name, city, sales profit, okay, a Guan's. And then previously if, if you have to preview diameter, you will select this Preview option. Ok, click OK. Now these are the columns, product ID, name, country, city, fields, and Profit. Click. Okay, so this step is done industry, but next time we're going to use d and address step. So this is, in this step I'm going to use this option. Derived column option. So using this option, I'm going to transform my dataset. So I have to first, I have to first connect this option with this option. Let me just joined this option. Okay, now this has been connected. Double-click desorption, ok, columns, these are the columns. So now what I want, I want to use TK's Statement. So the reason I'm going to use the case statement because here in this particular country field, I want to replace United States were USA, United Kingdom with UK and Australia with EUS. So I want to represent Country field in the form of abbreviation. So I'm going to use this case statement, country equals to United States. Question mark is den, then value should be usa. If Country equal equal to United Kingdom, then value should be juke. If country value is Australia, then the value should be EUS. Otherwise, the valley should be Australia. Okay, so I'm going to just copy, paste this in. I will paste copy, and I will paste this in the expression. Okay, derived columns or I'm going to replace the Country field. Okay, now, next I'm going to use this. So what I want, I have this Citi Field, Citi Field. So where there is a blank value, I want to replace a blank value with n fleshy. So for debt, I need to use this expression. If city value equals, equals two, blank, then value should be replaced with n slash shape. Otherwise really should be City. So I'm going to copy this here in the expression. Let me just copy this here. Derived column should be city, okay, so I'm going to use another expression this time around. I'm going to use the string function. So which string function I'm going to use, I'm going to use the uppercase string function. So I want to present the product name in the form of uppercase. So this has to be an uppercase. Okay, Derived column should be product name, replace product name with uppercase product name value. So this is my Derived Column expression. Next I'm going to use d step for database. So which database step I'm going to use, I'm going to use d database connection step. So let me just search the step here. So I will use the OLED destination step. So let me just connected Derived Column step with this. Let me just open this step. So let me just close this source underscore product dot CSC file. I have to close this file first. Okay, don't save. Db connections, I have to create a new DB connection. New. Click new, Okay, server name, search the server name here. So the server name is, it is taking some time to display the server name. Services. I have to search here the services server name as this. Sql Server 2020 is the server name, okay, this if the server name. So if I have to search here, SQL Server 20-20, SQL Server 2020 server is up and running. Okay, now next step is select or enter a database name. Database name is dice. So I have already created this database in MySQL Server. Click OK. Okay, table or view table or view fine. Name of table should be I have to create a new table. So this is going to be the name of table DB destination. Let me just copy paste the table name here. So this is going to automatically create this table, okay? Right now this table is not created, so dysfunctionality is going to automatically create a table in my database. This is the name of table and database name is dice in order to create a database, okay? The data is already an object called this. Ok, fine. So I have to write 01, click OK. OK. I have to write desk 0-1. Now this data basis, this table name is not present, so this is going to create this table. Okay, fine. Click ok. This table name mapping. So this just says mapping product id is mapped with product ID, product name, country. So these are the fields off this destination table. This one is d. This one. These are the fields off of this particular file source underscore product. So click OK. This step is done industry. Now you have to save this package. In order to save, You have to press Control S. So this package has been saved. Now, if you have to execute this packet, you have to either click, you have to right-click or n, then this execute task, or you can directly execute this package from here, start. So the package has been executed. Now, look, the package has been executed successfully. In order to confirm it, I have to execute this query. Look, Product Name is in uppercase, country. Fields. Look, a country has been represented in the form of abbreviation, city where there is a blank value, it has been replaced with N slushy. Okay? So this is how you insert data in a database. And this is how you read data from a local computer system. So with the help of, with the help of this Derived column, fish, I have applied a different transformations on my dataset. So hope you understood the concept. Thank you so much and have a great learning. So from here I'm going to going into edit mode.
6. Multicast Transformation Explanation: Hello everyone. In this particular lecture, we are going to learn the multicast transformation in SSI. Yes. So first of all, I'm going to create a new package. In order to create a new package, you have to click this SSAS package, then right-click then new package. So these are two packages that are already there in my project. This is the name of the project. Now I'm going to create a new package, new SSAS package. So I will call this, let me just rename this package. So I'm going to name this package as fixed underscore. Underscore loading. Named this package. Okay, so this is the name of package. Deals are two packages. Tetr already dated my project now. So this one is to control flow section, dishonesty Dataflow section. So when I'm going to click the Control Flow section, I'm going to get different tasks. So these are the tasks that I can perform. This one is the Dataflow section. So on my left, so when I'm going to click the Dataflow section on my left, I'm going to get sources in destination. So these are the sources from where I can read data from. So let's suppose if I have to read data from an Excel file, I'm going to use this Excel source. So if I have to read data from an external source, I will use this step. Ok. Then if I have to load data in my flat file, I'm going to use this flag file destination. So if I have to load data in my database, I'm going to use this OLED DB destination, okay? This one is the data flow step. Now what I'm going to do, I'm going to read this flat fixed fight employees dot TXT. So this is the file which I am going to read. So to, in order to read this file, I'm going to use this step. I will select source, source, then I'm going to select this Flat File source step. Let me just drag and drop here. Click, Double-click. Then I have to create a new connection, new. So this is how you create a new connection. Okay, General, here I'm going to provide depart. So this is the path where this file is present. I'm going to select this file format. I'm going to select the read, write, then column. So now I have to mock my columns. So the first column is ID, second column is the firstName, lastName. Then the disown is gender, gender, age. Gender. Defined is the country last column, m-th, advanced. Id, firstName, lastName, gender, country in age, preview. So this is my data. Ok. Click OK. So you can also preview your dataset. Click OK. Columns. These are the columns. Click okay, so the first step is done under strict. Now what I'm going to do next, I'm going to load data in my database first. So I have to use, let me use this multicast transformation. So the purpose of using the multicast transformation is debt. With the help of this transformation, I can send data to multiple destinations. You will have a single input. Single input value can be, it can be sent to multiple destinations without any condition. Ok, so now let me just connect the first step to this step. So this, this file is going to be sent to multiple destinations. So first my first destination will be my database. In my second destination will be D, Excel destination. So for Excel destination, I'm going to use this step, Excel destination. Now. Next I'm going to use the OLED DB destination. Since I want to send this data to my database to a table, I want to load data in my table. So I will use d OLED DB destination. Now, I have to create a new OLA DB connection, new, new, new. I have to server name, I have to refresh it. So this is going to display media server name. So just wait for a second. It will display you deserve a name. So you have to select your server name. Then you have to select the database name. So this is my server name, SQL Server 20-20, select or enter a database name. So I will select dice as my database name. Okay, so this dice databases already created. So in order to create a database, you have to write this command create database. Let say project is the database name, semicolon at the end. So this is how you create a database. Database has been created successfully. In this case, I have already created the dice database. So I'm going to click, okay, click OK. Then table or view. Now new. So this is the name of table. Now I have to change the name of table. So my table name will be DRG, target, DRG employee, EMP underscored destination table. So this is going to be the name of my table. Id, FirstName, ID, okay? The name should be 2020, gender, okay? 23. It has to be int. Jenga. Each should be int. Click OK. OK. Mappings, defense my source file, and this is my destination. Fine. Ok. Click OK. Save this file. I have to save this file first. Now, you have to execute this first. So this, this is my first destination. I'm going to create another destination which will be Excel file, Excel file, Excel destination. But first of all, I'm going to execute this. In order to execute, I will click start. Okay, all steps have been executed successfully. Now let me just check. Look progress. From here you can see the progress Dataflow. Fine. I have to first database dice. I have to refresh this list, refresh table. This one is the table name select Top 1000. So data has been loaded successfully. 50 rows have been loaded successfully. Id first name, last name, gender, country in eight. Now, next, what I am going to do, I'm going to create another destination. So with the help of multicast, I can send my data to multiple destinations without a condition. So this time around I'm going to use the Excel destination. So I'm going to connect this with this Excel destination. Now, next step is to configure the excel destination. So I will create a new connection, new browse. So this is the directory where I will create a new file. Let me just create a file, new Excel file. So this is the file where I'm going to load my data. The underscore employee dot XLS file. It has to be T, G. Open. Click OK. Now name of the Excel sheet. New, click, new click. Okay. So now I have to create a sheet here. So I will use this command create table. Let me just use this create table, Excel underscore file, ID integer, firstname, N bar chart, which is 10, first M 20, lastName, debt data type is n bar chart. This is gendered nine, end-to-end. T3. Age is integer, so click OK. Click OK. Now select the Excel sheet. So Excel underscore file. So this is T sheet which we have created. I'm going to select this sheet mappings. So now OK, click OK. So firstName, lastName. So this is the data which I am, I am getting from source. This one is country in H. So now, so what will happen? It will give me an error. Let me just save it. It will give me an error. Look, column id cannot convert between Junie, ct, n non unicode string data type. The reason it is giving me error because deta type of Excel in data type, which is coming from data, which is coming from source, have different data types. So this Excel destination in the source destination both have different data types. So first, I have to convert the data type. So in Excel, data types for string is N var char, whereas data which we are getting from source is of different data types. So in order to convert the data type, I have to use this Data Conversion transformation step. So before this step, before Excel destination, I'm going to use the data conversion that step, transformation step. Now, with the help of Data Convergence step, I'm going to convert the data type of my data. Now, combine this. So now OK, double-click OK, Select All. So firstName, the firstname will be string, it will be unique code string. Now this data type is compatible with Excel data type. Now, lastName will be Unicode string, Unicode string. This data type is compatible with N var char data type, which we are using for strength. Country, Unicode string. Each for age, I'm going to use t, two bytes signed integer ID for I am going to use D to bike signed integer. So let me just search two byte signed integer range. So this is the range of two bytes signing teacher. Now, okay, fine. Copy of IID. Look, this is the data type which we are getting from source. And this is copy off datatype, copy of this data type. So we are converting it to two byte shining teacher, we are converting this firstName. We are creating a copy and the data type is Unicode string. Now here, Excel destination. In the mappings. Instead of this, I am going to use the copy off IED, which is two bytes signing TJ, copy of first_name, which is a Unicode string. Copy of lastName, copy of gender, end copy of country. Last one is copy of H. Now, these two data types, source and destination, both a compatible OK. Click OK. Now look, there is no data. Save this package now executed. First, I have to truncate this table. Truncate table. I have to truncate this table. No truncate this table and execute. Table has been truncated. Now execute this. So now execute this. Look, I'm getting an error. Let me look. Let me just check what error I'm getting. So I'm getting this error. Excel connection managers failed with error exception. There may be, there may be error message posted before this with more information. Okay, I know what is the error. Okay? So what I'm going to do in order to fix this type of error. I'm going to go to the project, then, I have to go to the properties. Then, and I have to go to the debugging section. Here. I'm going to change the runtime value, run 64 bit, runtime value to false. Okay? Specify whether the project should start. 64-bit. Assess IS runtime, is 64 bit, assess IS runtime is not installed. This second is ignored. Okay, now apply click OK. Safe. Now I will not get this error, okay? Because 64 bit runtime was not installed, so I have to ignore this error. Start. Now I will not get this error. So my project is successfully executed. All steps have been executed successfully. Now I need to open this file. Target underscore employee, I need to open this file and name of this cheat if this Excel underscore file which we have created look ID, first name, last name, gender, country H. Data has been inserted. Total 50 rows have been inserted. Now I need to check my debt. I need to check my table. Look. Data has been inserted, 50 rows have been inserted. Okay, in order to go to this edit mode, you will click this. Now execution result. This one is the execution result. Project Package Explorer. Now look, vf Grey created three connections. Flat File Manager connection, Excel file connection, these two connection in the database connection, these three connections. And then we have this data flow task. In the data flow task, let me just open this Flatfile. D's are the components flat file source, OLED DB destination, ND, multicast component. These are three components that we have used. Parameters, variables we haven't used. Okay, now dataflow, right now, we have used this section dataflow, we haven't used this section control flow. So this one is the name of my packet. So hope you understood the concept. Thank you so much and have a great learning.
7. Union All & Merge Join Transformation in SSIS: Hello everyone. In this particular lecture, we are going to discuss union all end Merge, Join transformation in SSAS tool. So first of all, I'm going to create a new package, right-click and create a new package. So my package name will be, let me just change the name. Merge, join, underscore transformation. So let me first show you my dataset, which I'm going to read. Either two datasets, employee 0-1 dot csv. This particular site, in this particular CSV file, I have total 50 records. So I have ID, firstName, lastName, gender, country H in department id. I'll have an undefined, which is employee 0-2 dot CSC. In this particular file, I have gotten nine records. Okay? So these are the columns. So first of all, I'm going to explain the concept of junior nod. So when you have to combine the records you are going to use T union all transformation. Look in these two records, in these two files, we have how many columns we have? And number of columns seven. So when we're performing union all transformation, number of columns in data type should be same look id, id which is present in, in my another file, should have same data type, firstName, and the first name of implies 01 file should have same data type, ok? So number of columns in data type should be same. Okay, let me just close this. So with the help of union all transformation, I am going to combine these two files. So first of all, let me just use the Dataflow section. So click. Okay, now I'm going to select the source. I will choose the Flat File source since I'm reading a CSV file. Okay, to Flat File source, click new, I have to select, Create a new connection, connection manager. Okay? Select a file. I will select a file here. It is present in my directory. Okay, let me just provide the part of my file. This one, all all files. So myfile limb first file is in play 0-1 form a comma delimited file. I need to show you right-click properties. Look Microsoft Excel comma separated file. So I'm going to select this particular option, this former delimited columns. These are the columns. Ok, fine. Click OK. These are the columns. Likewise, I'm going to configure this step. Flat File source. It should have another connection. This time around, I'm going to select the second file, employee, 0-2 dark CSV file. This one, it has comma delimited file. Okay, fine. Click OK. Let me just preview my dataset. Okay, this is my phi is total nine records and myfile columns. These are the columns. Now let me just change the name. Employee 50 records. The second one, let me just change the name of this step. Employee. Nine records. Find not next, I'm going to perform the union all transformation. So I will go here. I will select the union all transformations though various union all transformation discipline is D union all transformation. So when we are performing the union all transformation, we don't need to sort our data. Okay, so this is how you're going to combine two records, two files using the union all transformation. Now, firstname, okay, this is going to be the firstname, last name, gender, country, each and department ID. Ignore. Click OK. Let me just click again. Output column name. This one is the output in. This one is D union on, in port one, union all input to input one and input two. And this one is going to be d, final output. Id, first name, last name, gender, country, each end department. So since we are going to use the merge join transformation as l, So I'm going to select one more source step, which is flat file source step. So let me show you the flat file source step, this one, I will select it. So next what I am going to do, I'm going to, I want to read this file, department dot csv. In this file, I have department id and department name in these two files as well. I have department id. So what I want first, I want to perform the union all I want to combine these two files. Then I want to have a joint I wanted to have a joint on the basis of department id. Okay. Next I'm going to read this department id file. So click this new transformation. Okay, fine. Then I'm going to select the department file, department dot csv file delimited. Okay, fine columns. These are the columns, advanced preview, department id and department name. Okay, fine. Let me just edit. Let me just change the name of this step. This one department is D named. Next I'm going to use deep transformation, which is known as Merge join transformation. So before using this transformation, I need to sort my data first. So with the help of Merge Join confirmation, I can do inner join, I can do a left outer join or I can do full outer join. But first, I need to sort my data first, either in ascending order or in descending order. So these are two inputs. Department engineering, these are two inputs. So first, I need to sort my data. So in order to sort the data, I will use this sort transformation. So let me just drag it. So ivy first sort my data. So click this. So I will sort on the basis of department id, find ascending order here also, I am going to sort my data on the basis of department id. Department id, ascending order, it should mean ascending order. Now I am going to have a joint. So select OK, input, Merge, Join Left inputs. It means debt. Decide. This input, employ 50 records. Decide is going to come on my left side and this side is going to come on my right side. Let me just select, okay, fine. Now, select this, n joined this. Now click this. Now I'm having a joint on the basis of department id. Now, I need to select my columns, which I want to include in my final output. So I won't ID first name, last name, gender, country sine h. No, I don't want department id, rather I want department names. I will select the department name. So I'm joining on the basis of department id. So if you want to have an inner join, you can select inner join, left outer join. You can have left outer join or you can have full outer join. So I'm going to have a left outer join. So look, input columns, output Alias, okay. Now click OK. You have to save this at the end. What I'm going to do, I'm going to load my data in my target table, in my database table. So for database, I'm going to select this transom, this step, this destination step where Li DB destination. Let me just combine this. Okay? New nu, I have to create a new. Let me just create a new. Okay. Dice is the name. Okay, fine. Click ok. This is the name of connection. Okay. Table name I'm going to select. No, I have to create a new table first. I have to create a new table in the name of table will be, let me just create a new table. Drg underscore employee, dst, table, ID, var, char, firstName, lastName, gender, country, each department name, OK. Click OK. Mappings. Fine. Click OK. No, save this. Okay, now I have to execute this. Execute. So this is how you execute this. So look, total 69 records had been loaded in my table. Let me just refresh my table. Dice. Refresh this database. There is my table. Let me just check my table. A new table has been created. So the name of table is this. Select top 1000 rows. Look total 59 records have been loaded in my table. Look. So since we have performed in null, left outer join, so we are getting the null value 0. Okay? So hope you understood the concept of Merge, Join and union all transformation. Thank you so much and have a great learning.
8. ConditionalSplit Transformation: Hello everyone. In this particular lecture, we are going to look at an example of a conditional split in SSI. Yes. So conditional split is just like if condition. So based on the condition data is routed. Ok, so in order to better understand the concept, let me just create my package first. So I'm going to create a new package. So I am going to call this package conditional split example, CFS example. Ok, I will select the data flow section. Click this, then I'm going to select the source. So first I need to select my source. So I'm going to select the database as my source, OLED, OLED DB source. So what I want, I want to read data from a database table. So for that, you need to use this source. So let me show you my table from where I'm going to read data. So this is my table. Merge underscore table. In this table, IF ID, first name, last name, gender, country, agent, department id. So what I'm going to do next, I'm going to configure this. Click this. So what I am doing right now, I'm creating a new connection. I'm creating a new OLED DB connection. I'm creating a new DB connection. So what I want, I don't want to create this new connection again and again. Okay, so what I can do, I can create a new connection here in deep connection manager. So I will create a new connection manager, new connection. Now from here, I'm going to select the database connection. So where is my database connection? I need to select from here. So I will select this OK, click OK. So I have selected this database connection. Now if I'm going to select this option, I don't need to create a new connection. So from dropdown menu I'm going to select, okay, now table name of table of overview, table. I'm going to select this one. Merge underscore table. So this table, I will select columns, error output columns. Now these are the columns of this table. So this step is done in domestic. Next I'm going to use the conditional split transformation. So conditional split drag and drop here based on the condition I'm going to route my data. So combine the first step with d, second step with the conditional split step. Now here I'm going to specify the condition. So I will specify the condition on this particular column, country. If country equals two equals two US, united states, let me write a condition equal equals to United States. The output will be USE. So this will be usaid employees. Okay? Now, if Country equal equals to, sorry, country, France. Then output will be France. This is the default, default output name. So this is going to be default, default output. Ok? So if country equals, equals two United States, this will be the output. If Country equal equals to France, this will be the output name. Different will be the default output. So, okay, now, next I'm going to select D Wally DB destination, LE DB destination. Okay? Mab, this the first one, this one is going to be output. The first one is going to be usa. Usa will go to this destination. The second one is going to go to D. The second one destination doesn't France. The third one. I'm going to have another connection database. This will go to the default. This will be the default. So I'm going to have three destinations. The first one based on the condition. So this is the condition, united states, this will go to this destination, France will go to this destination, and the default output will go to d, this destination. In the default output, I'm going to get data other then USA in France. Now let me just configured this. I don't need to create a new connection. Okay, it is already there. So table name will be. So i'm going to create a new table for this. Table will be USE. So this will be the name of table mappings. Fine. Second one I'm going to create new. This one is going to be France. Click OK. The third one is going to be default table. There. I'm going to get data other than France END USE, default. Employees, okay? Default. So in my first table I'm going to get data of USA employees. Second one will get France employees. Third one, we'll get data off. Employees other than Usain France. I have to save this. Now. After saving this, I am going to execute this package. It seems there is an error. Ok, what is the error? France, find, dishonest the name of table default. Whatever I am getting. There was error during execution Vernor more component field. Which one disk component is third, table or view Mapping columns on the mapping. Okay, fine. Mappings. Ok, fine. Saved this. No execute this. Okay. The processes successfully executed. Now I need to check my tables. So three more tables will be created. Refresh. And table names are, table names are decent USE. This will have USA employees. Look country USA. Usa implies the second one will be France. Decimal have france employees. Older France. The other one, default. This will have data other than France and USC. Okay? So this is the concept of conditional sprig based on the condition it is going to split your data. So hope you understood the concept. Thank you so much and have a great learning.
9. Lookup Transformation: Hello everyone. In this particular lecture, I'm going to explain the concept of lookup transformation in SSIs. So in order to understand the concept of lookup transformation, first of all, let me show you my lookup table. So what I want to achieve, Let me show you what I want to achieve. So in my lookup table, I have this Country column or in the country code. So first of all, I'm going to read data from my source. In my source, I have this field coil, contrary, let me just open source. Look, I have this weak countries, so I'm going to search this country fee value in my lookup table. And against the country field, I'm going to get the country code. So let me first read the source file here. So first of all, I'm going to use the flat file sources. Let me just configure this. I will create a new connection manager browse and CSV files, employee dot CSV or k columns. And advanced preview. So this is going to be my data set. Okay? This country look columns. This is going to be my source. Okay, next I'm going to use the lookup transformation. Lookup. Configured the lookup transformation. Double-click the lookup transformation, ok. So in lookup transformation, I have three, cache mode, full cache, partial cash, no cash. So when I'm going to select the cash more, so this is going to, this is going to select all the data from my reference or look-up table. This is my lookup table. It is going to select all data from, from lookup table and it will store data in the cache. Okay, total how many records? Let me just count the number of records. So total there are 13 rows in my table, so it will store all cocaine rules in D, cache, full cache. So let me just give you a scenario. Let me just explain how the process is going to work. Let me just open this file. So let me just apply a filter here. Currently, lead select the United States. So look, this value will be searched. So I have to first configure the full cache connection type, connection type. I can select dicker cache connection manager, or I can select the only DB connection manager since I'm reading data from my database table. So I'm going to select the oily DB connection manager. So if I'm reading data from a cache file, so I'm going to select this particular option since in this case we are reading data from a table database table I will select this, specify how to endl Ruth. With no matching entities. If there is no match. For example, this is my reference table or lookup table. If there is no match, what will happen? Fail component, ignore failure or redirect rules to no match output. Ok, so I'm going to have two outputs. One will be the mashed output, the other one will be the unmatched output, okay, connection. So I'm going to select my cable here, look-up table. Here. The lookup table is this lookup table columns. So what options I have? So I want to have a lookup on the basis of countries. Select this against country. I'm going to get the country court lookup column, country code. Okay, fine. Right now this Advanced Tab, I don't see anything in my advanced step. Click OK. It has configured, it is selected as full cash. More, click OK, save this. Now, next I'm going to select, I'm not inserting their time in my database. I will select since I'm testing. So I will select the multicast option. I will select to multicast. One is for mashed output, the other one is for unmatched. This is going to be output one is match. Click OK. The other one is going to be the unmatched lookup, no match output, okay, save this. Now execute the process. So when the package is executed, first data will be stored in first, this data will be stored in the cache, then the lookup will be performed. It seems there is an error. I have to see, OK, this file is open. Now execute the package again. Save this and execute the package again. So this is going to work. Now, look, 53 rules are matched, only six rows are unmatched. So let me just see. The resident's unable Data Viewer. Data we're all columns are meshed. Okay. Country Code. I went to see her. Country code. Okay. I don't want to see my unmatched growth. Look. Country against country of God, the country code, okay. Japan for Japan, 81 is the country code. Okay, now, what I am trying to explain right now, fully cache mode is configured. So when full cache mode is configured for, so what will happen? So let me first open this file. Apply a filter here. So let's say I'm selecting Australia and United States, or maybe Japan. Okay, let me just, okay, so let's suppose this is my first row. So what will happen in case of full cache mode? So this value will be searched in the full, in the cache memory. And I'm going to get the results from my cache, full cache memory, okay, so we use the full cache mode option when my lookup table is, is small in case my lookup table is huge. So this is, this is going to hit the system performance. The reason it is going to hit the system performance because if we have to copy the data from this lookup table two to attack, it is going to consume so much system resources. Imagine millions of records had been copied in the cache. So obviously definitely it is going to take so much system resources, okay? That is the reason why this full cache option is used when this lookup table is small, is not huge. Okay? What about the partial cache? How this partial cache, the concept of partial cache, is going to work. Let me select this option connection. Here you have to specify the cache size. Let's suppose 25 MB, OK, click OK. Now how the process is going to work. In this case, now, not all the data will be copied in the cache. So what will happen? Let me select, okay, so this is my first row. This United States is going to be searched right now the cache is empty. I haven't executed the package right now. D, d partial cash is empty. There is no data in d partial cash. So what will happen? United states will be searched in the, in the partial cache memory. And since the cache is empty, so it will not get the data. Now, what if the next process, it will. Next it will search the data in my lookup table, okay, from this lookup table, guess equal getter deta United States against the United States. It will, it will get this one country code. So in the cache, in default, in the partial cache memory, after getting the data, it will store United States in country code one. Okay? What about the next time? When this next time this row is executed? What will happen? United states will be searched in D cache, partial cash. Now this time around, in the partial Charybdis time around, we'll get that at time d, partial cash, okay, so against this country codes, United States, I will get one value from D partial cash. Okay, now let's suppose when the third time, when the third row is executed, let suppose the third row is this. Let me just search different Brazil. Let's suppose the third row is this Brazil. Now what will happen? This Brazil is not stored in the partial cache. So first, this value will be searched in d, partial cache memory. From the partial cache memory, it will not get the data. Now what's the next step? Next step is to search the data in the reference table, in the look-up table, in the lookup table, it will get the data. This is from against Brazil. I will get 55. Ok. Now this, these two values will be stored in d here in the partial cache memory. So this is how the partial cash more is going to work. Now when to use the partial cache option. So we use the partial cache option. Wendy, lookup table is large when now configured, OK, click OK, save this. Now I'm executing this package with partial cash. Execute. Okay, I have to close this file. Okay? I've got the result, ok, this time around, this is executed with partial cash. Now, next time I'm going to select the no cash. In case of no cash, what will happen when the first time this package is executed? No cache memory is empty. So what will happen? Let me just open the file here. Let suppose this is the first row. Let suppose this one is the first row. Japan is the first row. Does Japan value will be such in the no cache memory, so it will not get the data ok, then it will search data in the lookup table. From the lookup table, it will get the data, okay, then value will be stored in the value will be stored in D, no cache memory. Ok, this value, ok. Next time, let's say Japan. The next value is Japan. Next row is Japan, will be searched AND NO cache memory. From the no cache memory, it will get the data okay, because they take already saved. Now, what about this early? Let suppose the next value is this, Instead of Japan, the next value is Brazil. So the Brazi value is going to be searched in D, no cache memory. Okay? In the no cache memory, it will not get the data now this time around equal search in D look-up table, in the lookup table, it will get the data, okay, Brazil against Brazil, I will get this. So now what will happen? It will overwrite the previous value, Brazil. Now in the no cache memory, Brazil in 55, these two values will be stored. The previous value, which was Japan, it will be overwritten, okay, only single value will be stored. Whereas when it comes to full cache, when it comes to partial cash, all values will be stored. Ok. So in case of no cash, the latest value, the latest value will be stored ok, when the next time, when the new value comes, the previous value will be overwritten in the cache. So this is no cash. So connection columns. So when we use denote cash, so we used the no cache memory when we have memory-constrained, when we don't want to save all our data in the cache. In case of memory constraint, we used the no cache option. Now click OK. Save this. This time around, I'm running the package with cache memory option. Look, data has been executed, okay, a DD package has been executed. So hope you understood the concept. Thank you so much and have a great learning.
10. Cache Transformation: Hello everyone. In this particular lecture, I'm going to explain the concept of cash transformation in SSAS. We will also look at the purpose of using the cache transformation. So first of all, let me show you my lookup table, which is in SQL Server database. So this is my lookup table. So in the previous lecture, what we were doing, we were reading this data from the database table. Okay, so this is the database table. Now, what I want, instead of reading data from a database table, I want to read data from a cash. So how I can read data from a cash, I will use the cash transformation to first store data in a cache memory, then read it from, from, from a cash. So how I can do it? I can do it using the cache transformation. So there are two possibilities of doing this. The first possibility is either you save the data in cache file, didn't debt cache file will be stored in the cache memory. The second possibility is tagged. You store all the data in the cache memory. Ok, so with cache file, the benefit is tag, you can use debt cache file in multiple packages, in, in, in any package, OK, but with the first possibility debt, you copy all the data in the cache memory. With this case, the drawback is stack. You can only read data from it cash in a single package. No other package can read the data from the cache. So let me just give you a practical demonstration of this concept. So first of all, my first, first of all, what I'm going to do, I'm going to read this data, this lookup data, and I'm going to store it in, in a cache file. So for debt, I have to use t Wally DB source. Let me just use this OLED be source configured this step. So I will select the table. I'm going to select my lookup table. Okay, fine columns, fine. Click OK. Then I'm going to select the cache transformation here, gash transformation, this one. Ok, double-click. Okay, cache connection manager. I will create a new cache connection manager here. So this is going to be my cash connection manager. So what I'm going to do, I'm going to create a cache file. Here. I'm going to create, create a cache file that will be stored in a cache memory. So the format of this cache file will be this dark-field w. Okay, so this is going to be my formwork gash. Phi is the name open. So this is going to be the name off cache file. Click OK. Did must be at least one index indicates specify any index on that column tab, okay? Fine. Columns. You have to specify the index. In this case against country, I'm, I'm getting the country code. So country is going to be my index. So I'm going to specify a value one. So this is going to be my index. Find mappings, country, country, country chord. Okay, fine. Save this. So what will happen? It will create a cash fight with extension. With dark CHW extension, I have to execute this package. Now, what will happen? Look at the package has been executed successfully. So the benefit of creating the cache file is tat. Now, I can create multiple package packages and I can use this cache file there. Now, what I'm going to do next, I am going to create a new package here, new package. Now data flow. Then I'm going to select, or I can, I can select the existing package from here. So let's say lookup is my existing package, data flow. I have to select Dataflow section. This package, I'm going to cancel this package. Okay? So this one is the cash transformation package, and this one is the Lookup package. Double-click this. Let me just delete this. I don't want this ok. Deleted this multicast. So instead of multicast null, I'm going to use another destination, okay? No cash more fully cash more partial mode, no cash more. I'm going to select the full cache mode. With full cache mode, I'm going to get two connection types, option, gash connection manager and oily DB manager, connection manager. That this option, I'm going to read data from my database table. Okay, now what I want, I want to read data from my cache file. So I will select this option connection in here. I'm going to select the connection manager. So in this case, this is the name of connection manager. Let me just say the name of connection manager which I used here. So the name of connection manager is this. Gash connection manager is the name. So let me just select the connection manager name. Cache connection manager is the name. So I've selected did write connection manager ok. Columns. These are the columns against country. I'm going to get the country code now. Now, I'm not going to get the data from my database table. So I will get the data from my cache file, which is stored here, which is created here. Okay? D Ss. So this cache file will be stored in the cache memory. Okay, decent cashflow. Let me just open it. I cannot open this. I can open it, but the format, the Lord, being readable, This is not a readable format. Again, five. Now, here what I can do it cache transformation, it is done and dusted. So next time we're going to select the Flat File Destination, drag File Destination. Let's select this. So I'm going to store it in, in look-up match output. So I will select only matter, but let me just configure this. Okay, new, I will create a new OK. Click. Ok. Let's create a file here, CSV file here. So name of file will be simple. Okay? Open columns, this will be the column names. Preview, okay, fine mappings, fine. Country called, fine. So this is going to be the name. So let me just check the error here. Okay? Decode Pij value is 65001. Okay, let me just change update. Okay, let me just change it. The score page 65001. So I'm going to select 65001 option discipline. Okay. Now fine. Mappings find. Now execute. The package. Package has been successfully executed. Look simple dot CSV is the name of file which is generated here. Okay, so this is the file which is generated. If I have to include the headers, I can also do that. Now. So this is an example of reading data through a cache file. Now next I'm going to give you an example of reading data from a cache memory. So this was the cache transform. So now let me just edit the, instead of choosing this particular option, what I can do, I can use this option, okay? I can uncheck this option. With this option, it is going to read data from a cache memory naught from a cache file. But the problem with this option is that I can read data from the cache memory only in a single package. This cannot be done in multiple packages. But with this option, I can read this cache file in multiple packages. Okay? Like this one is another package. This one is another package. I'm reading data, I am reading this cache file from another package. I can create some more packages and read this particular psi. But with this option, this option will limit me only four for this particular package. Okay, now, what I can do, let me just edit this. Instead of this, I am going to choose this option, okay? Now what I can do to disguise my single disease, my single package, okay? The single package where first-time reading the source data, I'm reading the Lookup Data and then I'm placing that I in my cache memory. After this, I can create another data flow task. I can create another data flow task from air. And I can do this. So let me just configured this data flow task. Okay, now, in this data flow task, I'm going to build the same look-up logic. This lookup logic, okay, flat file, reading from flat file, then look up here in this lookup, I'm going to read from a cash. So I will select this then this connection type. I'm going to select the connection cache manager. Okay? So let's do it. Once again. Flat File source. Then let me just configure the flat file. Let me just select new browse. Csv, employee dot csv, ok, columns. Okay, find previews of this is going to be my data. Okay, fine. Next time we're going to use the lookup. Ok. This time around, I'm going to read data from fully cashed, select this connection here. I'm going to select the cache connection manager. Okay, so in the cache connection manager, I have configured. Let me first combine this fine glue cookie in the cache connection manager. I have to show you this in my first data flow task, cash. So cash connection manager, I select, I have uncheck this option, okay? Now this time around I'm going to read data from it cash. Now Data Flow task to look up, find the misconfigured. Next time we'll select the multicast just for testing purpose. Okay? Combine this with Multicast option. I'm going to only select the match output right now, no need to give an adder output. Save this. Okay, execute the task, execute the package. This is giving me an error, worst error y, I'm getting this error. Csv has ended. Okay, has allowed result, okay, fine. What is the error exactly? Let me just check the error. Y, I'm getting this error. Ok. Redirect rules to match, ignore failure. Redirect rules to match output. Now, it is good, fine. Ok. So this is how you read data from a cash soda. Hope you understood the concept of cash transformation in SSAS. Thank you so much and have a great learning.
11. Fuzzy Grouping: Hello everyone. In this particular lecture, we are going to discuss the fuzzy lookup and fuzzy grouping transformation in SSI is 2D. So first of all, let me just explain the scenario. So this is my Excel sheet. In my sheet I have this data, country item type, units or unit price, unit costs, total revenue, total profit, etc. So if you look at this country feet in this country field, I have some incorrect value. So for example, I have this country called India. So I have look at the value of India is incorrect, spelling is incorrect. End IEEE, this is incorrect. Look IND I0, I0. So what I want to do, I want to group by country Field. And I'm going to take some of unit price, unit cost, total revenue, total cost, and total profit. But if I'm going to do this, I will get the incorrect value because this country field value is incorrect, some values are incorrect. So I'm not going to get the correct group by result. Okay? So first of all, what I am going to do, I'm going to transform this data set, transformed this country field. So there are some other fields that are incorrect. For example, the United States. The spelling is incorrect. Okay. What about the United Kingdom? So look, United Kingdom. Okay, look, k i n, g, d am spelling is incorrect, so using the fuzzy grouping confirmation, I'm going to correct this field, I'm going this country field. Ok? So what it is going to do, it is going to take the majority of values. Look, look this fairly India, then it is going to go correct, the incorrect value. For example, India is going to compare with all the values, majority values, correct values. And then it is going to correct this value k, then this indium. So there are some other fields, let me just Canada majority values are this modality values are correct. There are some values that are incorrect. Diesel is are incorrect. Look at the spelling. Okay. Likewise, Australia, there are only 22223 values, ok. This value d disvalue, LLC. The correct spelling is this an okay, now, I'm going to read this file, Excel file. So I will use the flat file transformation, Flat File source, sorry not transformation does source step, double-click new browse. So I have to select the file first. Sales. Also, I am going to close this file. It has already closed. Okay, columns. Find advanced preview. So preview. So what, since I'm performing the GroupBy operations or I have to convert the data type for unit price. It should be, let say float. Unit costs should be floored. I have to convert this. Total cost, float, total profit, float, okay, preview. Find columns. Deeds are the columns which are part of my source file. Now, next, I'm going to use this fuzzy groping transformation. I am going to connect this with this fuzzy grouping double-click. Okay. Connection manager, specify the connection manager to be used for storing Temporary Tables. Data transformation uses. So fuzzy grouping is going to create this fuzzy grouping transformation is going to create the temporary tables. So debt is the reason why you have to create the database connection here. So I've already created, I've already selected this one. Columns. These are the columns. So now, which column you want? Which column you want to correct? I want to correct this country field input column. This input column is the column data is coming from the source to s2 column alias and grope output alias. So country clean is going to be d, corrected name, okay? Now type, there are two types. Exact match when there is a 100% match. First, molarity minimum similarity should be 0. Okay? Now numerals, so there are different options, so I have to explain these multiple options. So first, needed, leading and trailing new numerals are not significant. I have to explain the meaning of this sentence. So you have to look at this example, specify the significance of leading and trailing numerals in comparing the column data. For example, if leading numerals are significant, okay? If leading only leading numerals are significant, look at these two values, then these two values will not be grouped, okay? Heating value is 123 and trailing value is Street. Okay? So here in this case, I'm going to select needed in case of needed, leading and trailing numerals are not significant, then in this case, these two values will be grouped because I have selected in neither option. Now. Okay? Then I will use the multicast option. Enabled a data viewer option. A nibble Data View. Select the fields. Currently varies the country foods, country clean feed, country clean field is going to come here. Where is the country for eager design? Is the Country field? Let me just place the country field inside air. Ok, click OK. Now save this transformation in execute. Now you're going to see one thing is very important before executing it. I need to show you one more step, which is very important. Double-click advance. This one is done industry, I have to click this advance input key, output t and the similarity, similarity. So if there is, there is an exact match, then the similarity would be a 100%, almost a 100%. Ok. And now I want to set the similarity count to 62%, okay? Token delimiters, it means stead these daily meters will be considered while matching. Okay, space tab, carriage, return, line feed. In case you want to include some more. You can do that. You can add these delimiters here, okay? Now, 62%, okay, fine. Click OK. Save this. Now, execute. Now you're going to see the result, India. India. Okay, fine. India. This S1 is the incorrect valley. This one is the corrected value. India IIA incorrect value, indifferent is the corrected value. Okay. And what about the similarity? Similarity between these two values are 8379%, percent similarity is 79%. So let's say if I increase the similarity threshold, so what is going to happen? Let me just increase the similarity to eighty-five percent. Let me just show you the result here. Then. This and this integral, this India value is going to be treated as separate value. Ok? This will not be corrected. Okay, now let me just because in this case, similarity count is above 82%. Now let me just double-click advance and let me just decrease the similarity count to 62%. Click OK. So this is how you correct your data set, which is coming from source. Now, if you look at the United States value, look, this earlier has been corrected. Majority of values will be will be considered. This value is in minority, okay, state's United Kingdom, Canada. What about Canada? Majority values are Canada which is corrected values, some values are incorrect, which will be corrected by this fuzzy grouping algorithm. And look at the similarity count eighty-five percent. Look, this has been corrected. This will be compared with the with the corrected values and this will be corrected by the fuzzy. Grouping consummation Australia, Australia value has been corrected. Look, Australia, this is the correct one. Australia majority values are correct, ok. There are some values which are incorrect. The similarity threshold. Now. And now if I am going to perform the groupBy operation on the Country field, I will get the correct result. Instead of multicast, I will use the transformation. A group biotransformation here varies the groupBy aggregate transformation. Ok, now double-click. I will group by country field. And let's group by total revenue and total cost profit. Okay? No, country clean, I will take the corrected value. Okay, let me just take the corrected value. Country clean, this country is the incorrect value. Country clean is the corrected value. Total revenue, total cost, total profit. So some country I'm not going to consider this country. This one is the incorrect values in this field. There are some incorrect values. All values in this field are correct. Okay? Groupby, total revenue, total cost, and total profit. Now click OK. Then I'm going to use the multicast operation, sodium multicasts transformation here. After this aggregate enabled data view. Okay, country clean total revenue find. Saved this in execute. Look. Country, Canada, India, Australia, United States, United Kingdom. Okay, now let me just, instead of using the country clean fee, now, let me just use T. Let me just include the other field country fees. Now. Let me just exclude this country clean fluid is concretely, let me use the country freer discipline. Okay? Now grouped by country. Now you're going to see values will be incorrect. Look, which contains D incorrect values. Look, Canada deceives repeating twice, multiple times, I am getting the incorrect results. Okay, so this is how you transform your data source data set using the fuzzy grouping transformation. Let me just change it to country clean, which is the correct one. Okay. And now it has been corrected. And now you're going to see the correct data. Okay? So now next, we are going to discuss Tea for the lookup.
12. TermExtract Transformation: Hello everyone. In this particular lecture, we're going to discuss two term extraction transformation in SSRI. Yes. So first of all, I'm going to read the e-mail dot CSV file. So now first of all, I will use the frac file source in order to read the CSV file new browse, then I will choose this particular file, files and have to select the CSC file. This an e-mail dot csv columns. These are the columns. E-mail ID, email subject, email, description, email. Okay, So now what I'm going to explain e-mail ID, email subject, email should be 1000. Okay. Click Okay, columns desert or columns. Fine. Next, I'm going to use the term extraction transformation. So term extraction transformation is used to extract frequently used English words. Okay, so which words it is going to extract? It is going to extract noun audit is going to extract noun phrases. So I'm going to select this one, okay, Now I have to connect this on basically the purpose of using the term extraction transformation is to use, is that it is going to extract, it is going to give me the count off nouns data, data present in my email description. So since In this example, I'm using, I'm going to use the email description as a field. So it is going to count the noun words in the email description. So this is time extraction. Transformation is very helpful when it comes to SEO, search engine optimization. So Tom extraction here, what I'm going to do, I'm going to select the e-mail term, ND score. So email is the term and the score value is going to be the number of nouns. Okay? There is a problem is the input column can only have this data types. So first of all, I need to convert the datatype. So for debt, I need to use the conversion data conversion transformation. Let me just remove this arrow sign from here. First ivory convert, this data type. Okay, then I'm going to combine this fine data conversion. Email subject, email. Okay, datatype should be Unicode, text stream copy or female is going to be d new column. Okay, click Okay. Now term extraction, e-mail ID, email subject, email, term in the score. I have to select the copy of email column. Okay? Now configured error, output error, redirect rules. In case of failure, redirect in case of error, ignore failure. Click okay, now next I'm going to select the advanced. Right now I am not discussing this advance. Term type noun, noun phrase, noun, a noun phrase, I'm going to select noun. I have to show you the Excel CSV file, which I'm going to read email dot csv. So this is the file which I'm going to read. So look, this is going to be the email, okay? So in this email column it is going to extract the noun phrases, Okay, then it will give me the count. So the term value will be the noun phrases and the score value will be the count of noun phrases. Fine, Don't Save. Then. Now, score type is frequency. This is the enteric type, score type, okay, right now I'm going to select the frequency. Frequency threshold is two, maximum length. This is disabled. Okay? Now exclusion right now I'm not discussing this, Okay? Click Okay. At the end I'm going to use the multicast. So this threshold means Tet. The noun phrase should occur minimum two times. Now, enable the Data View. Enabled data, we're saved this and execute this transformation. So you are going to get d, two columns, term and SCO. Course. This is a noun, marketing is a noun fee. Ds are occurring four times, three times, two times. Now, next, I'm going to change something, okay? Let say three times, four times. So then this time around it is going to give me did only one row, which is occurring only four times. Look only because the threshold is set as for now, let suppose. Now there is another option. Let me just decrease the threshold. Look, you can also extract in noun phrase. Noun phrase is the combination of nouns. Okay? Then you can select the noun, a noun phrase, okay? Frequency threshold is two. So with these two options, you will get the maximum length of a term, frequency threshold and maximum length. So let's keep it simple. Exclusion. So let's say, if you, let say, I want to exclude any noun, any noun. So what I'm going to do, I'm going to select the ODBC OLED be connection. Then I'm going to select my table where the information is stored. The information is stored in this table. Exclude word. I have already created desk table and I have already inserted the word marketing. Now, let me just execute this query. Okay, This is in dice database, so now I have to execute this marketing. So I want to exclude the marketing work. So I will select the table limb, exclude word and words. Click, Okay, save this. It tastes going to exclude the marketing course n phi. This is occurring four times and this is a K2 times.
13. TermLookup Transformation: Hello everyone. In this particular lecture, we are going to discuss the term lookup transformation in SSIS. So for term lookup transformation, you need to have the lookup table. So I have this lookup table, term underscore look up. Let me just execute this. So in this table I have ID in the word search term lookup transformation in SSIS is going to determine how frequently these words are occurring in, in my description. So this is the e-mail description email, which I'm going to take this column, I'm going to take in this term lookup transformation is going to determine how frequently these words are occurring here in this description, okay? For example, liquid just give you an example. Marketing. Marketing is occurring how many times here? No, 0 times. What about this marketing is occurring twice? This marketing, in this marketing, triumph, 1, 2, 3, three times. What about this? 0 times, okay, so it is not going to consider if it is occurring 0 times. So minimum threshold value is one. Now, next I'm going to read this particular file, especially I want to take this email description. So for debt, I need to use D flat file source. Where is the flat file source? I need to use this thread file source. Okay, then double-click new browse, CSV files and email file dot csv columns. These are the columns advanced preview. I have to increase the length. This is going to be, let say 1000 columns. These are the columns. So another important thing is to read this email is taken as a data type of this email is string. In order to four term lookup transformation, you need to have Unicode string data type for debt, I need to use the data conversion. So where is data conversion? So I have to use this because term lookup only accepts Unicode email. Here instead of this data type, I will select this Unicode. Let me just select this unique code, this one Unicode string, okay? Output Alias copy of female Lent. I'm going to specify 1000. Okay, click, Okay. This is done industry. Next I'm going to use the term lookup transformation. So where is this system lookup transformation? So let me just configure the term lookup transformation here. Double-click. Now, first-time going to specify the reference table. So this is going to be my reference table. From this reference table, I'm going to extract the words and it is going to search in the email. So here I'm going to specify the term lookup table, which is my reference table. Term lookup. Okay, Then Tom lookup, okay, fine. So email, let me just select this email and this words. So these words will be, these little words will be searched in the email. Okay? At the end, I'm going to have two extra columns. One is term in the other one is the count of these words, okay, For example, this marketing is occurring three times, so I'm going to get Km value AND count earlier. Fine. I want this e-mail ID email subject. No, I don't want this email. I want this copy of female. Then I want to configure this section, ignore failure. F1's use gave sensitive lookup them. Okay. Now this is done in gastric. Next I will use the multicast enabled execute. Okay. Double-click. Okay, fine. Let me just edit a GWAS. I have to increase the size of this column. Now save this. I have to refresh the columns here. Okay, columns. Find now execute. Okay, Look, I've got two extra columns. Term, which is the word in the frequency Teradata is occurring two times. In here in this email.
14. Aggregate Transformation: Hello everyone. In this particular lecture, we are going to discuss the aggregate transformation in SSI. Yes. But first, we are going to discuss the aggregate functions in SQL. So this is my table, employee table. In my table I have ID, first name, last name, gender, country, each, department name and salary. So what I'm going to do next, I am going to find out did number of employees in each department. I have this department name field. So in order to find out the number of employees in each department, I will use the aggregate function count. So from this ID, I'm going to, I'm going to take account of this ID count. Then ID account is an aggregate function. So this is going to give me the count. Then I'm going to use the department name. Then I'm going to use the department name. This will give me the department wise count. Ok. Department name count, ID, groupby. I will grow by on the basis of department name, groped by department name. I'm going to use another average EVD, average salary. Then I'm going to use d min salary. Average salary of each department, minimum salary in each department. I'm going to use the salary here. Then I'm going to use the min maximum salary. So maximum is an aggregate function. Salary grouped by department name count. So this will give me the number of employees in each department, average salary in each department. Okay. Then the minimum salary in each department and the maximum salary grouped by department name. Let me just execute this query. So this is going to, Okay, this one is average salary. As EVD underscore salary. I need to execute this first average salary. This is going to be the minimum salary, F min underscore Salary. This one is going to be my f maximum salary. At the end, I'm going to find out the total salary in each department. Salary as Door tell underscore salary. So this is my complete SQL query. Every salary, minimum salary, maximum salary, total salary. Every salary cast as decimal. I need to convert every charity to two decimal points. Then I'm going to have ten. Execute the query. Look two decimal points. Now I am going to use the aggregate transformation in my, in my SSI AS2. This one is going to be number of employees. As number of employees. Number of employees. Ba department has six employees. At our department has 20 employees. Tech department has ten, implies. These are average salaries. Minimum salary, NBA department is 16 thousand. Maximum salary, NBA department 211000. Total salary is this. Now I'm going to use the aggregate transformation in SSI. Yes. So first of all, I'm going to create a new package. I will select the Dataflow section. So first of all, I'm going to read my data from this employee dot csv file. Okay, this is my file from where I'm going to read my data. First name, last name, gender, country, I-T, department name and salary. I will use the Flat File source. So from here I'm going to select the Flat File source, flat file dishonest destination, Flat File source. I need to click this 1 first, then select Flat File source. Okay, new browse. Then CSV files. I will select employee dot csv file code page. I'm going to select this 112 phi one. This one, okay, formatters delimited column name in the first data or find columns. These are the columns. Preview, dishonors, my dataset. Okay, advanced ID, string, find firstName, lastName, gender, gender, country. Each. It should be, should be a numeric value. Let me just select a numeric value. District BY a numeric value, department name, salary. Salary should be a numeric value. It has to be a numeric value, cannot be a string. I'm going to select salary as a numeric data type. Is this okay? With gender, fine. Id. Let me just select the numeric or I can have, I can, I can select IDS string, OK. Click OK. You preview my dataset. This one is my. Dataset. Click OK. Now, next step is to use the aggregate function varies the aggregate, aggregate transformation. Where does the aggregate transformation decent? I'm going to select the aggregate transformation, connect this. Now. I'm going to select first, I'm going to select the department name. On the basis of department name. I'm going to group my data. So I will groupby I will perform the groupBy operation on the department name. Then I'm going to select id. So this is going to give me the number of employees. So each employee has a unique ID, okay, now, next I will select the salary. Okay? So I want to calculate first, I will calculate the total salary sum of salary in each department group by Dan, I'm going to have I'm going to take a count. Count. Okay. This one is number of employees. Okay? The second one is the total salary. Salary, this one is going to be this one is going to be average salary, the average salary. Then I will select salary and average minimum salary, min salary. Salary, maximum salary. So first count, sum average, minimum and maximum salary. I'm grouping on the basis of department name. Click ok. This step is done in district. Next, I'm going to use t multicast effort, testing purpose. So what I want. So multicasts step multicast transformation can also be used for testing purpose. Let me just connect this with this one. Then edit, review metadata, then data viewer, and enabled data viewer. So it is going to destroy multicasts. Transformation is going to display me disk data set. Okay? So this is, this is going to be used for testing purpose, department name, number of employees, total salary, average salary min, max. These are the columns. Jet will be displayed by this multicast transformation step. Now save this. You have to save this step package five. Let me execute this. So this is going to display different display midi, dis, okay? Output. Now department name, number of employees, six employees, 20 employees in HR department, total salary. This is every salary, minimum salary and the maximum salary look just needs to be displayed in a proper format. It has to be displayed into decimal points right now, look. Now in order to display this in two decimal points, I need to use the data conversion step. Close. Before. Before the multicast step, I need to use the data conversion step. So where is the data convergence step? Let me just discipline. Delete this. Now, I have to display a 102 decimal points. No data conversion. Every chair Larry copy of every Salary. This has to be in decimal points. I'm going to convert this to decimal, scale and precision. So scale is two. Procedures are not going to select the precision. So what is precision in scale? What is the difference between these two? Preceding is something like let me just give you a number a, let say two to three dot for, for precision will be 1-2-3, 3-5. Precision value will be five, and scale value will be two. It is two decimal points. If it is three, if it is like this, then the precision value will be six into scale value will be three. So that's the difference between precision and scale. Now it will be converted into two decimal point. Click OK. Multicast. Now, edit metadata. This one is the metadata decimal to scale is to precision is not cert okay, Data View. So these are the, these are the columns or fields will get will be displayed by this. No. Department name, number of employees, total salary, average salary. Okay. Every salary I am going to put here, okay? Minimum salary, maximum salary, okay, after this average salary, then minimum salary. Then maximum salary. Okay? I'm going to change this name. This has to be changed. So from here I'm going to change this name, average salary. I don't want to use the copy of average energy. It should be EVG. Salary. Edit. Then get a viewer. Okay. Average salary. Let me just select this one. Total salary, every salary min salary and the maximum salary. Click OK, save this. Now execute the package. Now it will display the average salary into decimal points. Maximize this look, dishonors the average salary which is being displayed in two decimal points. Look minimum salary maxims LED. Minimum salary in ITER department is just 10 thousand. Maximum salary is 29 thousand. Average salary in marketing department has this total salary is this, this one is the average salary. Then intake department, average salary is this minimum and maximum salary, minimum salaries deaths and the maximum salary is this. So this is the concept of aggregate function. Okay, I hope you understood the concept. Thank you so much and have a great learning.
15. Row Count Transformation: Hello everyone. In this particular lecture, we are going to look at an example of row count transformation in SSI. Yes. So with the help of row count transformation, we are going to record two tanks. So we're going to record, record count any duplicate rows. So Ricard count will be the number of rows in my number of unique rows in my, in my source file. Duplicate rows will be the duplicate rows count of duplicate rows. Other then these two attributes, I'm going to save task ID. I'm going to record task ID, task name, Package ID, package name. Then I'm going to save the carb count, duplicate rows, username, load time. So I'm going to record disinformation these attributes, okay? All these attributes in my table, database table. So now I'm going to give you an example of how to find out duplicate rows in SQL table. So this is my same table which I am going to use for this purpose. Select, execute this query. Now in order to find out the duplicate rows, I'm going to use the aggregate function called count, count coma. Then I'm going to use as duplicate. Okay, then I'm going to group by DZ attributes groupby. I will use the group by clause, group by all these attributes. Okay, so now execute this SQL query. So this is going to give me destroy is occurring only Valentine destroys a drink twice. This row is a ring three times. Next I'm going to filter this result. In order to filter the group result, I have to use this having clause having count greater than one. So this is going to give me do duplicate rows. So I have three rows which are duplicate rows. So destroys occurring twice, destroys occurring twice. The last row is occurring three times. Okay? There are three rules which are duplicates, which are duplicate rows. So now what I'm going to do, I'm going to create a package. I have to create a package, new package. So I will name this package. I will name this package a record, a row count. I will name it row count. Ok, so this is going to be the name of my packet. Then I will select the dataflow. Click this toolbox. So I'm going to read data from my Flat File source. So let me just show you the file which I'm going to read. So this is going to be my file. Okay. So this file has total how many records? This file has 13 records. Genuine records are 556789. There are nine genuine records. Okay. Now, let me just Click this, configured this. Ok. Now select, this implies 02 dot csv columns. Ok, fine. Preview. I need to change the data type as well. For ID, I'm going to save it. Not, I'm going to save it as integer. Firstname, lastName, string, gender, string, country h. For h, I'm going to save it as department id. I am going to save it as an integer value. Now preview, preview. So if I have to select this option, it means that if there is a null value in my, in my source, retained null values from the source as null values in the, so it is going to retain the null values. Okay, I'm not going to check this option. Ok, click OK. Since I haven't, There is no null value in my source table, source file. Now after this, what I'm going to do, I'm going to use the aggregate function. The reason I'm using the aggregate function because I want to separate, duplicate rows in my genuine rules in MI Okay, rows. So I'm going to separate these three duplicate rows for debt. I need to use the aggregate function, then I have to use the conditional split function. So aggregate. So what I'm doing, I'm, I'm going to group by ID, firstName, lastName, gender. So with the help of aggregate function, I'm going to do this. Okay? Then with the help of conditional split, I'm going to apply the condition. This condition. Now, I will use the aggregate function. Double-click. I will select all rows. This is going to count all, okay, grouped by ID. Fine. Then let me just expand this group by ID, group by name grew by lastname, group by gender, country H. I have to group by this, group by group by this, i will take account. Okay? Now, click OK. After this, I'm going to use the conditional split, various conditional split, this one. Now, conditional split. So now here I'm going to use the column, this count all, I'm going to use this column count all in I will apply the condition greater than one. This is going to be my, okay. This will be my heroes. Default output will be the duplicate rows. Duplicate rows. Now click okay. So this is going to give me an error that data type DTU O18, ADT, E14 are incompatible. I have to make this compatible, so I will use this. Let me just use this. So this is going to convert this data type, ok, now, greater than to, less than two, less than two, this is going to be less than two. Click OK. After this. So look, output, okay, less than two. If count is less than two, then it is going to be non duplicate rows count, Let me just less than count. So d less than two. So 123456, diva, OK, root, ok. Now, click OK. Now next time we're going to use the after-death step. I'm going to use the row count step, a row count. With this row count step, I have to select the row count step is row count steps. So before using the row count step, I need to define two variables. That is, I have to search this. This is the row count step, okay? Control c and control v. Okay? Duplicate rows. This one is going to be OK. Let me first alert this. Rename this list. Now, save this. Now in order to configure these two row counts, Ds to row counts, I have to define a variable. In order to define a variable, I have to click this option. So I will define two variables. And so the first variable will be for the fun, and the second variable will be for this one. So Ricard count will be my first variable, Ricard underscore count. Data type will be integer. The second variable will be duplicate. Row count will be my second variable. Enter data type will be entitled to default value will be 0. Now I have to configure this. The first one will be the record count. The second one will be the duplicate rows. So in this case, duplicate rows, they'll be three. These three rules will be duplicate. Now. Next, after this step, I have to go, I have to select the control flow section. So this one is the data flow task. So if I'm going to click the data flow task, this is going to take me to this Dataflow section. Let me just click this, not click this. This is going to take me to this Dataflow section. So once this data flow task is executed, after this, I'm going to use another task. This time around, I'm going to use the execute SQL task. Okay? So connect this task with distrust Execute SQL task. So what next I'm going to do? I'm going to create a table. So I will create this table first. Let me just create this table. New, new query. I will create this table, create table. Table logging. I'm going to name this table logging task ID. In this table I am going to save task ID data type is varchar 50, task name, Package ID, package name, record count, duplicate rows count, duplicate row count, username in the load time. Default valley with gedit. So if no values inserted, this is going to give D. Let us datetime, okay, now, in order to execute this, I have to select this. Then I have to click execute. This has been created, table has been created, refresh the database. Tables. This has been created with name table underscore logging. Ok, fine. Now next step is to use this insert into table logging. I have to give this name, table logging. Ok, I have to give this name. Select this table logging, insert into table logging task ID, task name desert variable, system variables, task ID, task name Package ID package name, desert system variables, record count, any duplicate record count. These two are, these two are user variables which we have created. Now let me just paste here. Table logging task ID, task name, Package ID, package name, record count. Here I'm going to use this. Whereas this record count, fine. This one record count and could duplicate record count, duplicate record count, duplicate row count. Ok, I'm going to use this. So now first, I have to select this SQL statement connection. I'm going to select this connection which I have already created. Next step is to select, to provide the SQL statement here. Here I'm going to provide the SQL statement first, let me just use this duplicate row count. Here I'm going to use to duplicate row count. Okay, fine. In the user name. Fine. Now, control C. And then I have to pay as chair in my statement section, SQL statements section I have to paste this insert statement, insert into table logging. Task ID is the system variable Task Name, package ID, package name. This one is d record count, which is this record. Let me just disregard count is deep user variable which we have created. Then the duplicate row count is also the user variable which we have created. Last ptsd system variable, which is user name. Ok, click OK. This one is configured. Next I have to use this parameter mapping. Add. The first one is looked system variable, variable name. So the first one is task ID. I'm going to select the task id from here. I'm going to select the task id. Then I'm going to select Task Name, task name. Third one, I'm going to select the package ID, package name. Package ID, which is depicted ID package name. Okay, after this, I'm going to select the package named in the record count ended duplicate record count, record count. Where does the record count? Record count? This is the user variable. Last one is dy, duplicate, row count, duplicate row count. This one is the username. So I'm not giving value of this load time because when data is, when data will be inserted in this table, automatically, the default DateTime value will be loaded. Ok, so therefore, I'm not giving the load time value, not duplicate row count. After this, I'm going to use the username. The last one is d username, whereas the username, this one is the user name. So task ID input data type will be varchar. I have to select the varchar task name data type will be varchar, Beckett package ID, VARCHAR versus VARCHAR. Package name will be whiter. Okay, record count is the long, which is the data type, ok, duplicate row count long, username will be varchar. Ok. Next is this parameter. Okay. Task ID first index 0123, okay. So 00, then Task Name 123456. The seventh value which is diluted time. This value it is going to be, this value will be populated automatically. Okay? Click okay. This has been configured. So next time we're going to execute this package. So click this button to execute the package. To package has been successfully executed. Now I'm going to check my log table. So this is the table which I am going to query. Refresh this. Okay, let me just select this table. Select Top 1000 rules. So table has been populated. Task name, this is the task name, Package ID, package name, row count six unique rows, three duplicate rows. Username, this is the user name and this is the time when this row is inserted in my table. Now click Dataflow section. Now, click here. Now what I'm going to do, I'm going to insert this six rows in my database table. So after this step I'm going to use the OLED DB destination step. I have to use this step will only DB destination step. Now, configure this. I'm going to select this database, this connection manager, table or view I will select this table name will be I have to create a new table. So table name will be DRG, underscore, employee. Underscore. Finally table. So this is the name of table. Let me just remove this count all I have to remove this count all numeric. I have to remove this. Okay, fine. Int, first name, last name, gender, country, each department ID and OK. Click OK. Mapping ID, first name, last, firstName, lastName, gender, country, eeh, OK. You find Click OK. saved this package. Now I am going to execute this. Again. Click this to execute this. Okay? Six rows have been inserted. Now in order to check my table, I'm going to first to refresh it. Let me just check if data is inserted or not. Table name is DRG underscored this EMP underscore final table, I have to query this look. All six records have been inserted, 12346. Okay, these records have been inserted. These are unique rules. Ok. And now let me just query this table. Another row is inserted, a record count shakes duplicate rows three, this is the username. This is the load time at which this row is inserted. This is the package ID. Package name is this task name as this. And worst of package name. This one is the package name. I need to show you the package name first, row count, this one is d, package name. Okay? So with the help of row count step, we have recorded two important attributes. Record count in the duplicate row count. D's are d's are system variables, okay? These are system variables. These two variables we have created. We will call these two variables as user variables. So hope you understood the concept. Thank you so much and have a great learning.
16. Pivot Transformation in SSIS: Hello everyone. In this particular lecture, we are going to discuss the pivot transformation in SSAS tool. So let me first show you the data set. So this is my dataset, student id, student name course intmax. So with the help of pivot transformation, I'm going to present my data set like this, okay? So pivot transformation is going to convert the raw data into column data. So this is the row data networking Java C Plus Plus desert data, which is in the form of rules. And it is converted into columns, C plus plus and java marks in the networking marks. So lx has got a 100 marks in C plus plus a. Let says God to 30 months in Java, NC, 200 marks in networking. So I'm going to present this table like this with the help of pivot transformation. So first of all, I'm going to create a new package. I will name this package pivot transformation, pivot package 0-6. I will limit pivot pivot transformation. Now first of all, I'm going to select the data flow task. So I will select data flow. I'm going to click here. So the first one is the data flow task. Click data flow. Then I'm going to, first, what I'm going to do, I'm going to insert data in a table. So first of all, I will create a table in SQL Server databases. Dice at this table will be created in dice database. Create this table and then execute the insert statement. Insert into, okay. Now select star from source underscores Student. So now I'm going to convert this. I'm going to convert this table like this, okay? Like this. Now, first of all, let me just use the source. Source, DB, source, DB, DB source. Okay, now double-click. So this is the connection I've already is selected. Now data access more table. So I'm going to select this table which I have just created in SQL Server source underscore Student table. Here I'm going to select source underscore student table. This is the name of table columns and preview student IDs to them name cores, intmax, click OK. The first step is done in dusted. Next step is to use deep pivot transformation. So I'm going to use the pivot transformation. Okay? Then I'm going to connect this. Let me just double-click this transformation. So pivot key sub pivot transformation has pivot key in the pivot value. So first of all, I'm going to sort this. Data on the basis of t. So in this case I'm going to select student id as the key. So on the basis of student ID, I'm going to start my data, student ID. I'm going to sort this data which is coming from this table. Now, student ID no pivot tea in the pivot value. So what is pivot key in this case? Pivot is course courses, the pivot key, okay? This course is the pivot and the pivot value is mocks. These are the marks. Okay, now let me just select courses. Private key and pivot value will be mocks. Ignore unmatched pivot values and report them after data flow execution. Okay, click now here I'm going to select the columns output. No, now I am going to select Networking. Okay, seep, I'm going to select C plus, plus Java networking. These three columns I'm going to select here. So select him. So first I will select networking. You have to provide the values here. Then C plus plus java generate column. Now, click OK, no, C plus plus Java networking. Click okay. So this step is done and dusted. So next I'm going to right-click this pivot step and then click Show Advanced Editor. So what I want, I want to include all columns, including student name, okay, student ID, course mock student name. I want to include all columns, input and output parameters. Input columns. These are input columns. What are output columns? Student ID, desert output columns. So I want student name as well. So what I'm going to do after student ID, I need to displace student name so I will click Add Column. So I'm going to mention student underscore name. Okay? I will click student underscore name. Source column is minus1. Here's tutoring name. I need to click student name. And D, signage ID is 20. This ID is 20. Specify the line h, idea of the output column when this item was first placed in the data flow. So what I want in order to match this column with this column here in d Student a name I'm going to mention the Source column. And the fourth column is 20. Signage ID equals to 20. Here I'm going to mention the Then name 20 in the pivot key value will be 0. Okay, now, StudentID desert output columns student id name C plus plus max. I have to change as c plus press mocks, Java mocks. Okay, networking much. I have to change this. Ok, click OK. So this step is configured. Next I'm going to configure the final step, which is flat file Destination step. I have to combine this Double-click, new Flatfile configuration. Click OK. Now what I am going to provide the directory here, SSAS document files, okay, so let me just create a file layers. I am going to create a file, a new CSV files. I'm going to create a CSV file a, I will double-click care. So I'm going to save this file as this CSV file. Double-click OK, I have to double-click open this file, save this file as the CSC commas separated file. Csv with a CSV, TSV, okay, save this file and the name of file will be pivot. Example. Saved this silence with this name. Okay? Now, delete the decide. Okay, fine. Now browse. I have to mention this file CSV pervert example dot csv file columns. Okay, advanced finds to rename C plus plus java marks, networking marks, preview, D's are the columns in my, in my dataset. Okay? I have to update this column names in the first data flow, I have to check this option in order to get the column names as well. Now. Okay, fine. Lake. Okay. Now mappings. Ok, I have to ignore this connection manager update. Why I am getting this extra column. Preview student ID, okay, fine. Mappings, student name, C plus plus month, Java marks, networking marks. This is going to be student ID. Click OK, save this file. Now I'm going to execute the package a click Start button, and the process is executed successfully. Now I need to open the file. So this file is, look, this is the file I need to open this side. So now you're going to look at the data look student id, student name, C plus plus java maths ended networking marks. So data has been displayed. Now next what I am going to do, so I'm going to use the FTP task. So I need to click this. I need to go to Edit Mode control flow after the data flow task, what I Avant, I evolved to use d FTP Tasks. So basically work, I Avant, whatever the file is created here, I want to copy this file in the FTP directory were to retreat gall it remote directory. Okay, so this is going to be my local directory and I'm going to copy this file will be my remote directory. So I want to configure this FTP task. So how to configure this FTP task? First, I'm going to connect the first task would this task for download files ELA server in my system. So I'm going to download this, download and install this file Zola server for windows. So I already have it in my system. Files villa server interface. I'm going to open this connect. I did use it. I will create a new user, add a user's, I will call it assess IES underscore user is the name. Okay. Then I'm going to enable the password. So password is going to be Edmund 1-2-3 is my password shared, folded. So this is the directory. So this is going to be my dream. More directive there. I'm going to place my files and my SQP directory is this. So I'm going to call this as my remote directory, Ed. And I have to a directory. And FDP, SSI is SSAS, FTP files. Okay, so this is going to be my, this is going to be my remote directory. So right, read, I'm going to give all rights. Delete, okay, click OK. And now what I am going to do next, I'm going to open the SSI. Yes, I'm going to open this dust. Ftp does so the first is the connection, FTP connection. So select Create a new connection. Server name is this server name IS one to seven dark, dark, dark one. Okay. One to seven dark, dark, dark one. Username is assess IES underscore user password is admin 123. Test Connection, Test Connection, Successful. Click OK, then file transfer local part. So this is the part I have to provide my local path. So my local park in this case will be this one where this file will be generated. Okay, this is going to be my local path. I have to provide this part here. So I'm going to provide the local part here, new connection and browse. And this is going to be the file, okay? Pervert example dot csv file, then operation, which operation you want to perform. So you want to perform receive files. I want to sync files. So I want to send files from this directory, from this directory. So which file I want to send this pivot exemple dot csv file. Next time I'm going to where I'm going to send this file. I'm going to send this file in my remote directory, and this is a part of my remote directory. In case I have some folders. I'm going to, I'm going to provide the name of folder after this slash. Right now, I don't have any further directories. My remote directory, this is my remote directory know folder is there. So overwrite file a destination JS. In case 5A destination JS, I want to overwrite my file. If file is already there, this option will overwrite. I can overwrite my file with this option. Okay? Expression. I'm not using the expression. Click OK. So this file task has been configured. So what it, what will happen after this, after execution of this package file will be created here. This one is going to be my local directory file will be created after this. Once this data flow task is executed successfully, this task will be executed. This FTP task will send this file, will send this file to disk directory. I need to show you the directory where it is going to place the file. It is going to place the file here in this FTP files directory. This is going to be my remote directory now in-order. Now I have to execute the package here. This is how you execute the package. So look, Control Flow, FTP task has successfully executed. Look, this is my remote directory. Pivot exemple dot csv file is created. Look student id, student name, C Plus Plus math, Java marks networking mux 200 to 80 to Turkey. Okay, so now in order to look at the logs, I can go to the progress. Look D's are the logs of my package. Look, start, finish. Okay, all steps have been executed successfully. There is no editor in my execution. Nine rows. And then three routes have been, three roles have been generated after this step. Okay, so hope you understood the concept. Thank you so much and have a great learning.
17. UnPivot Tranformation in SSIS: Hello everyone. In this particular lecture, we are going to discuss the unfavored transformation in SSI. Yes. So this was the result of pivot transformation. Let me just open this file. So with the help of unpublished transformation, we are going to convert columns into rows. So look, D's are the columns and we are going to convert it into rows. Okay? So now I'm going to use the unfavored transformation. First of all, I will create a new packet. So this is an example of pivot transformation. Now I'm going to create a new package and I'm going to name it. And the package name is package six. I'm going to name it pivot exemple and pivot exemple. Okay. Now I will click Data Flow. Click OK. Next I'm going to click, I'm going to select the Flat File source. Ok. Double-click flat file manager. I have to create a new fred file manager. Then I'm going to select a file. So which file I'm going to select, deceived or file which I'm going to select. So in this file I have distributed the original form of this data is discipline. This. Let me just read this again. Let me just get the latest results. So this is the original form. So I'm going to convert this to original form, convert this data set to original form. So this will be my one column course, and these will be my mocks. Ok, now, close this dataset. Browse and I have to select this particular file. Perfect example, Dark CSV. Ok, former delimited a half to check disk often column names in the first data row, columns. These are the columns, okay, d that the result, advanced preview. No, advanced student id is integer data type. Ok, student name, string mocks, integer, Java marks, integer. Networking marks will be integer. Preview, click OK. Columns. These are the columns. I have to preview my datasets and this is going to be my data. Click OK. Next, I'm going to use d and pivot transformation. So with the help of unfavored transformation, what I want to achieve, I want to convert my data set like this, student IDs to rename core smocks. Ok, double-click this. Then I'm going to select C plus plus Java networking marks. D's are the fields. They're in my course column, networking Java end x0 plus, plus, ok. All other fields like student IDs to didn't name intmax. These fields, they'll come look. The option is selected desk passed through. These, these fields will be passed through, okay? Now, okay, destination column. What is this destination column in the pivot key value column name. In this case, the pivot key value column will be different. Private key with key value column will be coarse. So I'm going to give course as pivot key value. Destination column will be marks. Marks. This column will be my destination column. This column course will be my pivot key value. These two fields, these two columns, student id, student name will come as it is because I have selected pass-through. Click OK. This one is done and dusted. Let me just rename this and pivot transformation. And pivot student data. Rename this. Reading student data. I have to save this. Now, the second step is done in domestic none. Next, I want to see the result of my AN pivot transformation. So what I can do, I can use any destination step. Like I can save my result in a, in a CSV file, or I can save my resulting in a database table. Or I can use the multicast step to just view the result. In this case, I don't want to store this data anywhere. Okay? So what I'm going to do, I'm going to use the multicast up just to view the data. In order to enable the view option, you have to click this arrow sign addict. And then you have to go to the Data Viewer option, enabled data viewer. Okay? Then you have to select the columns which you want to view. So I want to view student id, student name alone with student id, student name, I want to view Goethe and mocks. So these are the fields that I want to view. Okay, next, I have to save my saved, my package, the packages saved. Now, click Execute. Now this is the result which I am going to get from an pivot transformation Student ID, Alex, okay, C Plus Plus One AT this result is similar to this result. Alexnet working 200 mux. Alex networking marks 200 and let's Java amounts to 30, Alex C plus, plus max 180. Look, Alex C Plus, Plus marks what AT? This result is very much similar to this result. Now, click Cancel, go to D, a package execution competed with success. Click here to switch to the design mode or select a deep or stopped debugging from debug mode. So this is the debug mode, okay, click this. So this is the concept of an pivot transformation in SSI. Yes, hope you understood the concept. Thank you so much and have a great learning.
18. Slowly Changing Dimension Implementation(SCD-Type01,SCD-Type02): Hello everyone. In this particular lecture, we are going to discuss the slowly changing dimension. Also, we are going to implement the SCD Type one and Type two in SSI. Yes, so first I'm going to explain the concept of slowly changing dimension. So in order to better understand the concept, let's consider this table. This table, employee ID. In this table I have Employee ID name in-state. So these are the employees. Employee ID 1-0. 0-1, name of an employee is Alan and he is living in New York. Likewise, Georges in employee and is living in Los Angeles. Let say George's decided to move from Los Angeles to Chicago. So what I need to do, I need to maintain the George data in my historical George historical data in my data warehouse, okay? In this case, the changing dimension is state. State can be changed for Bob anytime Bob can move from Chicago to any other state. So the changing, the changing column is state. Okay? So now let's say George's decided to move from Los Angeles to Chicago. So this value with the changing value, this column is the changing column, okay? So the current value will be Chicago, in the historical value will be Los Angeles. So what I want, I want to maintain the maintain the historical data of this employee George. Okay. So we have three STD types, SCD type 0-1, 0-2, NSZD type 03. In SCD type 01, we don't maintain any historical data. We just update the current value. The current value is Chicago and the previous Valley is Los Angeles. So the current value is, will be updated. Look, so look, my table will become like this for this is SCD type 01 where we do not maintain any historical data in SVG type 0 to NSS, I guess we have a concept of start date and end date. Also, we maintain the historical data using the current status. So look, George. The old state was Los Angeles from this date to dictate who was living in Los Angeles right now, he's living in Chicago. This is the current value. Okay? So we can, we can implement the SCD type 0 to with the help of start date and end date. Also, we can implement SCD type 0 to in SSI is with the help of current status. This was the old status and this one is the current status. In SCD type 03. We maintained the previous state. This is the previous state. In this one is the current state, and this one is the effective date. So one to 2005. So this is the date when George has moved to Chicago. So this will be d. Effective date. Now, in this particular lecture, I'm going to implement SCD type 01 in SCD type 0 to in SSI, Yes. So first of all, I need to create some tables. I need to create my source table. So look, first of all, I'm going to create this table, create table source underscore employee table. I'm going to create this table in SSAS. Database name is dy, so this table will be created in dice database employee id name state. Let me just execute this table command, create table command. Next time I'm going to execute these three commands. Insert into commands. Copy paste these commands. Okay? All data will be inserted. Data has been inserted. Us deselect statement to query the data from source underscore employee table. Let me just query the table. So this is going to be my source table. Okay? Now next what I am going to do, I'm going to create a dimension table, SCD type 0-1 table. So, okay, now I have to create a table, my target table, SCD type 01, employee ID for employee name state. So now first of all, I'm going to implement the SCD type 01 in STD. Let me just repeat the concept of SCD type 01 in SCD type 01 V do not maintain any history. We just update the data, okay? We just update the current debt. And now this table is empty, right now. There is no data in my SCD type 01 table. Now, first of all, what I am going to do, I'm going to use Team source. I am going to use the database source, who LE DB source. Let me just drag and drop here. Okay, the project name is std. Double-click. So this is going to be the connection table. Table name will be the source table. I have to select my source table. So my source tables, source underscore imply table. So this is going to be my source table. These are the columns, Employee ID, name, State, click OK. This one is configured. Next, I'm going to copy the slowly changing dimension transformation. So this is the transformation built ten transformation in SSI. Yes. So with the help of this transformation, I am going to only implement SCD type 01, SCD type 02. Okay, double-click, click next. So this one is the connection manager, database connection manager, table or view. I have to select the table or view. For table, I'm going to select my target table, my dimension table. This table. This table, SCD type 01 table, in this table, okay? Employ id. It is going to be my business key. Ok, it is going to be my business. Click, click Next. Now, fixed attribute, changing attribute into historical attribute, let's say if I select the changing attribute. So changing attribute means Ted, I'm implementing SCD Type one, historical attribute mean stat. I'm implementing SCD type 0 to on that attribute. Employee name. Employee name will be my fixed attribute. I cannot employ name cannot be changed. Okay? Employee name cannot be changed, but employee can go from one state to another state. So state NBA, changing dimension, changing column and now state, so I have to have two changing attributes of state can be changing attribute or a changing column here, okay, d is two attributes. So the first one will be the business key, the second one will be d fixed attribute that cannot be changed. The third one will be the changed attribute. Ok, now I'm implementing D STG type 0-1 first failed up transformation if changes are detected in fifth style attribute, let say if I do some changes in employee name, so it will not work, it will give me an error. Fail the transformation. Next. Okay, next, finish. So look, the logic has been implemented automatically, slowly changing SCD Type one logic has been implemented. Now I'm going to show you my source into my dimension table, SET one table. So this is my source table, and this one is my dimension table. Right now it is empty. So when, when the first time I'm going to execute this, let's execute this. So it is going to populate the SCD type 01 table. It has been successfully executed now, all rows will be inserted. Okay? Data will be same. No, data will be same. Okay, 1001. Okay. Now what I want, I want to do some changes in my source table. So I will use this update command. I want to update my source table. And the reason I'm updating this, I'm updating the state. So from Los Angeles, George has moved to Chicago. So I'm going to change this Chicago, update. This data has been updated now execute. Look in my source. Data is Chicago in, in my, in my target or in a, C, D type CD 01 table. Statists Los Angeles. So what I want, this state should be updated in my, in my target table as well. So let me just execute this package again. Now, since I'm implementing a CD type 01, it will automatically update the target table, the state column. So now the package has been executed successfully. Now execute. Look Chicago, this has been updated, value has been updated. So this is an example of SCD type 0-1. Next I'm going to implement the SCD Type two with date in SCD Type two width, width status. So first I will create this target table. I have to create this target table first. So here I'm going to create this target table. So this is going to be my SVG type 0 to date, I am implementing this date. Table has been created. Next, Control C and paste chair. Okay, right now my target table is empty. State, start date, and end date. Now I'm going to implement the SCD type 0 to width date fields. Now. Next, have to select the table. First. I have to select the SCD type, this table, STD to date table, okay? Business key will be imply ID. So this is my dimension table. This table I am selecting this dimension table, okay? Now find next. Here. Employing name will be fixed attribute, this one will be the historical attribute, okay? This one will be D historical attribute. Let me just show you my target. Employing name will be the fixed attribute. This one will be the business key. State will be the historical attribute. D2 will be. Come in. In. Next, I will show you where it will come. Okay, historical attribute, fixed attribute imply name next, next. Okay, start time, I have to start date and end date. So in this particular, in these two fields, in the second field, I'm going to get null values. The first field, I'm going to get the current date-time. Next, next finish. I have to save this first. Look. I have to save this first. Now I'm going to execute this package. So right now, this table is empty. It is going to populate. This table, okay, now, look, all that has been inserted. End date will be null, different will be D current_date. Okay, next, what I want, I want to update my source table from Chicago. Let say I'm going to set this as Ellie. Let say George has moved to a lake. Now. Look, this one is a link, dishonest Chicago. Now execute this again. Execute this again. Now execute these two tables. Look, look, start date, end date. Look. Dist is the current value, start date, and an end date is null value. So currently, George's moving in. George's living in Los Angeles. Okay. This one was the old value. Ok, started and ended. So this is an example of SCD type 0 to width start, date and end it. Now I'm going to show you an example of SCD type 0 to width status. So since I want to maintain the status, so I will use this table. Okay, now let me just copy this. I will create a new table. Current Status table has been created. Successfully. Just copy this table. Okay, now, double-click next. This time around, my target table will be disarmed. Status table, Current Status. This one will be the business key, employee id. Click next. Okay, imply name, fixed attribute, state. This one is going to be the historical attribute. Fine. Next, next, okay, this time around, I'm going to select this value one vent current. Current. So current value we will be represented as current. Its exploration value, inspiration value can be represented as current or false. I'm going to give expired. Now you must select a value from column to indicate the, OK. I have to select this current status. Click Next, Next, finish. So this has been configured SCD type 0 to width status. Now look, I have to first execute. So when the first time this package is executed, when the first time get eyes inserted, I'm going to get the current status. All values will have the current status. Okay, now I'm going to update this Los Angeles to Chicago. Let's change it to Chicago. This has been updated. Okay, now look. Execute Luke George. This is my source table. Look current test Chicago. Again, execute the package. Stark. Now. Again, look expired and the current status is Chicago right now. Georges living a Georges living in Chicago. So this is how you implement the SCD type 0 to width status. So with the help of SCD type 0 to V maintain the historical information. We maintain the historical data. Okay. This is how you stored the historical data in your database. Thank you so much and have a great learning.
19. Import column & Export Column Transformation Example: Hello everyone. In this particular lecture, we are going to discuss the import and export column transformation in SSI. Guess. So first of all, I'm going to give you an example of import column transformation. So let me just create a new package first. So let me just select the import column transformations. I will select the Dataflow section. Click this and Toolbox. I'm going to select the import column transformation. Let me just click the Import column transformation. So with the help of import column transformation, what can we do to import column transformation reads data from files in Ed's a two columns in a data flow. So let me just give you an example here. So what files this import column transformation can read? It can read any file, it can read a text file. It can read an MP4 file, JPEG JPEG file, or a CSV file. It can read any file, okay? So these three files be read by import column transformation. It will store the data in a data flow output column of the data flow. So first of all, let me just use the read, read Flat File source. Okay, Flat File source. I have to configure this. So what I want, I want, I want to provide the names. But first of all, let me just create a text file, a file names. In this text file, I'm going to save the bottles D files. The path of this particular file is you have to click shift button, then you have to right-click copy pot based upon. So this is going to be my first employee dot CSC, fine. Then I'm going to save the disk file shift, pressed Shift, then right-click Copy S part. So this is going to be my second part. Now I am going to save my third part. So this time around I'm going to get the part of this MP4 file called ps pot. Okay? Now, so I have to save this file. So all paths have been defined in this part, in this particular file. Now, what I'm going to do, I'm going to configure this first step, knew flat file manager. Then I'm going to provide the path. I'm going to select this particular phi d SSID is. Let me just select the part here. Import files, this flat file, okay, columns, column name. In the first row. No, there is no column in my file. Preview. So if there is no column, this SSI is tuned, will automatically give this column, column 0. So these are the paths in my file. Click OK. Columns. Ok. This one will be the output column. You can name it data. Click ok, this is configured. Next I'm going to use the import column transformation. So I have to select the import column transformation here. This one. Then I have to combine this. Ok, double-click this input and output. So this one is the input. Select this input, input and output. Let me just select the input. So this one is going to be the input. And I have to define the output egg column. This is going to be the output data. Now, I have to map this input to output. In order to map the input to output, I have to provide the line age ID. In this case, line age ID is for this input column line age ID is this. I have to provide the line age idea of this. So line age ID is 24. What about this? I have to manually provide the line age ID. Now this column and this column, these two columns are mapped. Ok, so what import column will do? Import column is going to store the information in the output column. In this column. Okay, it is going to store the information. What information? This file disk data imply dot CSV data on the valley dot JPEG dis image data in this MP4 data, it is going to store here. Okay? Data type is image, okay? Click Okay. Next what I want after storing the information in d import column, I want to load the data in my, in my database. So in order to load the data, I have to use the Wally DB destination. Step. Find a table name. I have to create our table. First of all, I will create a table. In order to create our table, I will use this command. Create Table. Database says dice. I have to select the database file limb var char 500 data file is VAR binary max. So click OK. It has been created. Select star from this table. Select star from this table. Right now this table is empty. Okay, fine. Let me just configured this table name is. Data table. Okay, mappings, filename, okay, source data. And this one is d phi Limb. The second one is going to be d delta. This one is going to be the data which I am going to store here. Okay, I'm going to store it now. Awesome. So this has been configured null. I'm going to execute my package. So look, it is sub package has been successfully executed. Now execute this look. This one is t data file, which is stored in the form of var binary. This one is the exact file path and define LIM. Ok. Now I want to retrieve these three files. So what I'm going to do first of all, let me just delete DC files from my folder because I want to retrieve these three files from my database. Now in order to retrieve this data, I'm going to use t import not export column transformation. Now I'm going to use the import export column transformation. So create a package. This time around, I'm going to use the export column. So first of all, level, use the dataflow. Click this SSIs tool. I'm going to use the OLED DB source. Then I have to select a table first. This one is going to be my table file limb in data file, these two other attributes. Next time we're going to use the export column. Export column, various export column transformation. I have to select export column. Double-click, extract column, which column you want to extract I1 to extract the data file, file path column. This one defined is defined path. So this is going to be defiled path in this is going to be d delta. This one is the data and dishonest defiled part. Okay? So maximize this, allow pink force, truncate, write byte order. Okay? These two, I'm going to select these two. The reason I'm not selecting this because data is already truncated. I'm not checking this option because data is already truncated. Now, I have to save this package 08 is my, is the name of my package. Simply execute the package. You will see data in the folder. Data has been exported using the export column transformation. So from this table, basically, from this table I have exported data. Look, these three files have been exported. Let me just open this file. All ok. So this is going to be the file image JPEG file, which is a, which has been exported, and this is d MP4 file. So I hope you understood the concept of import column transformation and export column transformation in, in SSI, Yes. Thank you so much and have a great learning.
20. Copy Column,Derived Column and Character Mapping Transformation: Hello everyone. In this particular lecture, we are going to learn how to create a copy of columns. So there are different transformations that can be used in order to create a copy of columns. So first of all, what I am going to do, I'm going to read for my datasets. So this is going to be my data set. So what I will do, I will create a copy of sales, profit and product name. So now let me just close this. Now. I have to select my source. So first of all, I have to use the flat file source in order to get the data set. I have to create a new connection manager, browse CSV. I will select the source underscored product columns. Advanced preview. I have to select Product ID should be an integer data type. Let me just change the data type of this series should be a floating point. Floating point proffered should be a floating point. Okay, review. So this is going to be my data set. Columns. These are the columns id name, countries, city, sales, and profit. So what I'm going to do, I'm going to use to derive column transformation. So with the help of derived column transformation, not only I can create a copy of columns, but I can also perform some transformations. So with the help of this expression in this section, I'm going, I can perform transformation or on my dataset, on my columns. So first of all, what I am going to do, I'm going to select, add a new columns. I want to create a copy, add a new column expression. So x will create a new, it will create a new column. So I'm not, I'm not taking a copy. I'm creating a new column. In the New column, what I'm adding, I'm adding, this fails. So this is going to be my expression. So Derived column name, what will be the name will, what will be the derived column name? Copy off. Sales will be My name. Okay. The second one I'm going to create, add a new column, profit, then copy off profit. So basically what I am doing, I am creating a new column and indeed new column section I'm editing. I'm putting this profit values, profit column values. Okay? Data type is float, okay, as it is. Click OK. Now next I'm going to use t multicast transformation. Now I'm not, I'm not inserting my data anywhere, so I will use the multicasts transformation just for testing purpose. Enabled data view, double-click, Geta viewer, product ID name, countries, city sales. Let me just put sales here. Profit. Then I have to put sales. Okay, wait, sales then I have to put copy of sales profit, copy of profit, okay, now, click OK. I have to save this. Start. No, look. Sales, copy of sales, profit, copy of profit. So this is how you can create a new column. Input values, often existing column. Now double-click. So what I'm going to do next, I am going to transform this sales column. Okay, so let's shake. I want to multiply this sales value by 12. I want to get the monthly sales, expected, expected yearly sales, okay? So yearly sales and ELE profit. I will multiply this. So now I have to use the round, I want to round my, my Ds to Coulomb fields. So here I'm going to use this round expression. Okay? I have to use the round expression here. Columns Sales. I will put sales. Then I will multiply sale by 12. Then I am going to use land two decimal points. Okay, fine. Then I'm going to use the round with this round. Here, I'm going to use the prophet. Multiply by profit by 12, then land should be two decimal points. Ok, click OK. Now execute the package. Now look to 1.386. Profit desist a profit and yearly profit as this two decimal points. Ok, maximum is two decimal points. So this is how you can create a copy of you can, you can create a new column and you can perform, this is how you can perform transformation on a, on a new column. Now, Not what I can do next. Instead of derived column transformation, I will use the copy column transformation. So copy column transformation is going to create a copy of columns. Ok, so what does this doing? This is basically creating a new column. This is creating a new column, which is this one in. I'm adding values, often existing column, then I'm multiplying values by 12. So with the help of copy column transformation, I can only create a copy of columns. I can create a copy of single column or I can create a copy of multiple columns. Now instead of derived column transformation, I will use the copy column. No, delete this transformation. Nowhere is copy column here. I'm going to use the copy column transformation. No. Double-click. Now I'm going to use the copy column, let say product name. I want to use this product name. I want to create a copy of this product name. Okay, also sales and profit DC-3, copy of then multicast Data Viewer, ok. Enabled her to view product name, product name, copy of sales, copy of profit. Okay, save this and start. So in copy column transformation, I cannot do a transformation. I can just create a copy look copy of product name, copy of sales, copy of profit. Okay, now next what I am going to do, I'm going to use the character map transformation. Let me just use the character, my approximation character map transformation. So let me just remove it, double-click. So character map transformation, corrected map transformation is used on string data type. And let me just close this and I have to search here character map, transformation, look character map, apply string functions on character data. For example, convert from data that is in uppercase or lowercase. So if I have to convert my string data to upper or lowercase, i can use the character map transformation. For example, product name. So in put columnist product name destinations, I'm creating a new columns. If I select this, this is going to update the existing column, but I want to create a new column operation. What operation I want to perform? I can, I can convert this product name to uppercase, lowercase. These are different operations. Ok? Let's say I want to convert this to uppercase Click OK. And the alias copy of product name. Okay? So this is how you use the character map transformation with the help of character my approximation, I can only perform, I can only perform transformations on a string data type. What transformations? These are the list of transformations I can, I can perform on String data type. So I can create a copy, but at the same time I can update the existing column, okay, if I select this, so this is going to update the existing column. This will not create a copy, so I wanted to create a copy, so I will select New. Click OK. Now select this Double-click. Edit, Data View, product name, country, city, sales, profit, and copy off product name. Click OK. I have to save this. Now, execute the package. Now you will see look copy of product name. So no, what I am going to do next, I am going to use this in place change. So this is going to update the existing column. Now execute this, we'll convert the existing column values to uppercase look Product Name, computer, mobile phone. So if you have to specifically apply, if you have to perform the lowercase or uppercase transformations and you have to create a copy of columns, then you can use the character map transformation, okay? Copy column transformation is just going to create a copy. Whereas the derived column transformation with LPA, Derived column transformation, I can do multiple transformations. I can do, I can use multiple functions. I can, I can use math functions, I can use multiple different functions, string functions, datetime functions, or this transformation gives me multiple options, okay? Operators type grasp functions, malfunctions. Ok, here I can perform the transformation on my columns. So hope you understood the concept. Thank you so much and have a great learning.
21. Percentage Sampling & Row Sampling Tranformation: Hello everyone. In this particular lecture, we are going to learn how to take sample data from, from the original dataset. Let me first show you my original data set. So this is my original data set, sales records dot csv. So in this particular file, I have total. How many records? Total? I have a 100 thousand records. Okay, so what I want, I want to take, I want to take few sample data from this original data set. A few simple rules. Okay, so what I can do in SSI is I can either use the rules sampling transformation, this one grew sampling confirmation, or I can use the percentage sampling transformation. Ok, so in this way I can get D sample datasets from the original data set. Okay, now, in order to first of all, I'm going to read this simple Records dot csv file. So this file is located in hair in my, in this directory. Okay, now, first of all, I'm going to use D flat file source. Okay, where is Flat File source? I have to use this first, okay, Flat File source. Ok, double-click configured this step, new, browse, and select the CSV files. So this one is my file. Okay, columns, column names in the first data row, fine. These are the columns of my files. This one is my original data region, country item type, order, order date. I have to change the data type to date. Let me just change the data type. This is going to be date, order ID, ship date. This is going to be due date. It is date. Okay, each unit sold, this is going to be the integer value for various unit price. This is going to be floating-point unit cost. Okay? Unit cost, float, total revenue is in floating point. So that is the reason why I'm changing the data type. In floating point. Total profit is also in floating point or kept preview the datasets and this is the dataset looks, these values are in floating point. That is the reason why I have change the data type k columns. Okay, all mapped configured, OK. Click OK. The first step is configured. Next, I'm going to use t rho sampling transformation in order to get dissemble rules. Okay? Ok, double-click number of rules of how many rules you want to get from source. Ten rows, let say if I specify 200, so I'm going to get 200 random rules, okay? Lets us specify ten. So from this dataset, I want, I want only ten rules out of, out of a 100 thousand, I'm going to, I'm going to get only ten random Ruth. So sample, this one will be the sample rows. This one will be the other output. Ok, columns. So these are the columns. Fine. Next I'm going to use the multicast step. This step I'm going to use to get D data to view the data. Okay, here I'm going to provide this sample rho. This is going to be output, okay? This output will come in D multicast. So I also want to preview my datasets. I will enable the data viewer option. Okay, so this is going to display, this is going to give me ten random rows. So right now this option is unchecked. So now every time I execute this package, I'm going to get different data set. So first time let say if I execute the package, so I'm going to get the ten rows, copy data, let me just paste my data here. So I'm going to get the ten random row, okay? Total revenue, total cost. Now, when the next time I'm going to execute this packet, I will get different rules, okay? So I will get different dataset. Okay? This is the random behavior, so I am going to get data randomly. This reliable, get another ten rules. This time around. The data set will be different. Okay? Asia dataset is different. Now, what if let me just enable this option? So what if I enabled this random seed options? So if I give value one, so this random seed value is defined as one. So this value will be used in a, in an algorithm. So now when I'm going to, let's say ten rows, let me just click. Okay. So what is the purpose of this? Using the same random seed on the same input, always generate the same sample. I'm going to explain the sentence. Specifying random seed is recommended only during the development and the testing of package. Okay? So if you are performing, if you're, if you're doing some testing, you can use this option. Now, since I've defined the value one, let's execute the package and see the result. Okay, I'm going to get this result. Let me just copy the result here. So when the next time I'm going to execute the same package, I will get the same result. The reason I'm going to get the same result because the seed value is defined as one, I will get the same result. So this is the meaning of this sentence. Okay, now, save this package. I will get the same results. Can look, copy data. Look, I will get the same result. Okay, look. Now, next what I am going to do, I'm going to, I'm going to change the seed value. So let's change it to two. Now I am going to get a different result, okay? So I will get ten random Ruth, but it will be different from the previous, previous rows. Look at this data set is different. Copy. Let me just paste here. Okay, this data set is totally different. Now, next what I am going to do, I'm going to instead of multicast, I'm going to use the hearing, I'm going to use the aggregate function. Let me just remove this here. I'm going to use the aggregate function. Aggregate simple rules, okay? Next I'm going to use the multicast, okay? So I'm going to aggregate on the basis of country. Okay? I want a unit cost, total revenue, total cost, total profit, but I want to group by on the basis of country. I want to perform the sum operation. Okay, look fine. Sample rows, I will increase the number of fruits, let say 500 rows. I want 500 sample rows. I want to perform aggregation on this 500 sample rules. So if I'm going to perform the aggregation on a 100 thousand rows, it is going to take so much time. So I'm going to perform the test operation on 500 rows, execute the package. So the package has been executed, but I have to configure the Data Viewer option Enable. Here I'm going to select, okay, fine. Here I'm going to select, okay, let say one. Fine. Now execute the package. Look. Country. Total cost, total revenue, unit costs, total revenue, total cost, and total profit. Next time we're going to use the percentage sampling. Another transformation to get the sample dataset. This transformation is known as percentage sampling. Let me just create a new package here. So I'm going to use, I will use the same dataset. And this time around I'm going to use deep percentage sampling transformation. Flat file falls very scared for the source. Okay? Now next, I'm going to use the sampling. Okay. Let me just let me just close this. Yes, I have to close this now I'm going to use the percentage sampling. Okay, percentage of ruth, 1%. So let me just open my dataset. So in this data set I have, I have a 100 thousand rows. So I want to get 1% of my dataset. 1% becomes 1000 rules. So I'm going to get around 1000 rules. So I'm not going to select this option, okay, next time we're going to use the multicast step sample. Okay? So unable the data we're option. So I will get approximately 1000 rules. It could be more than 1000 or it could be less than 1000. Okay? Okay, how many rows have got? Have got, it is still executing. It is, it is 970 rows. I've got 970 rows, okay? When the next time I'm going to execute this package, it is going to give me different rows. Okay, let's look at the row count here. 991 this time around, it has given me 991 rules. Ok. Let's change it to 2, 2% percent. It's around 2 thousand. So click OK. So this is going to give me around, this is not going to give me exact 1002 thousand rows. Okay. This time around it has given me how many rows? 2033 rules. It could be more than 2 thousand or it could be less than 2 thousand. Okay? So this is percentage sampling, 2% of total data set, which is, which is around 2 thousand, okay? Now this one is my second input which haven't configured. So if I select this, so use the random seed to it is going to use the seed value to generate the random rows. So right now the value is set as one. Click OK. Okay, now this is going to give me the same result each time the package is executed, this is going to give me the same result. Look to 0-6 three, again, I'm going to execute the package. This is going to give me the same result. If I want a different result, I have to change the seed value. Look 2063, I need to change the seed value in order to get a different result. Ok, now too, so this is going to give me now, this is going to give me the different result because the seed value is changed. Look 19, 5-4, okay, it could be more than 2 thousand, R0 could be less than 2 thousand. This is percentage sampling. Basically we need sample data set to perform testing. So click OK. So this was the concept of percentage sampling and the row sampling. In row sampling, we provide the exact row value, okay, in percentage sampling we get the percentage. For example, if it is 2%. So I'm going to get to 2% of total dataset not exert. It could be more than 2%, slightly more than, or it could be less than. Okay, I hope you understood the concept. Thank you so much and have a great learning.
22. BulkInsertFinal: Hello everyone. In this particular lecture, we are going to discuss the bulk insert task in SSIS. So when you are going to create a package, you have to click the Control Flow section. You don't have to click the Data Flow section. Okay? After this, you're going to go in D SSIS toolbox, then you have to click the bulk insert task. So using the bulk insert task, we are going to load data into a SQL Server, okay? So bulk insert task loads data from a file directly into a SQL Server. So which file we are going to load? I'm going to load this employee dot csv file. So we generally use bulk insert task when, when we have to load huge volume of data, okay? So benefit of using this task is TAT. We don't perform any transformation. So when we have to load data without performing any transformation, we use this bulk insert task. Also, this bulk insert task is very fast, okay? Now, we can also insert, we can also load data using the data flow task. But with data flow task, there is an option of performing conformation. So then we can also perform transformation in a Dataflow tasks, but with bulk insert task, we cannot perform. Okay, now what we are going to do, we are going to create a table first. So for this, I'm going to use this script. If the table doesn't exist, it is going to create a table first. Okay? If table exist, then it will not create a table. It will then insert data in a table. So first of all, before using the bulk insert task, first, I'm going to use D Execute SQL task, okay? Double-click this. So you have to configure this. Execute SQL task, okay, name Execute SQL. Okay? Now connection type O, L E D B connection. You have to select a connection. So this is the name of my database connection. Databases, dice. Okay, then I have to, I have to provide the SQL statement here. Here I'm going to provide this SQL statement. This is the SQL statement. Copy this and paste chair. Then you have to click OK, this is done and dusted. Click Okay, this task has configured. Let me show you my schools, my script. Look, this is the script select star from this data schema name and this is t table schema. Let me just keep this. Execute this. If, if this table exist, if this table exists, name of table is, let me just provide the name of table. Let me just open the execute SQL and let me just change the table limb table name. Let me change it to let me just change it to okay, bike, EMP, underscore, table Control C and table name Is this. Okay? So this is going to be the name of my table. Okay, fine Click, Okay. Fine Click. Okay. Next I'm going to use the bulk insert task. If this table exists, I have to use this table. Find. Execute. If this table exists, then it will not create, it will not execute this statement, okay? If this table doesn't exist, then it will execute this create table statement. Now then I'm going to combine the first task. For the second task, I will configure the bulk insert task here named disconnection. Now I have to provide the connection here. This one, I'm going to have already created disconnection, no need to create another connection. Okay, destination table. So this is the table where I'm going to load my data. So these are tables, these are the tables that are already present here in my dice database. I cannot write anything here, okay? So I will not write anything. Format is specify OK, row delimited, OK. control, LF. Column delimited is semi-colon comma comma separated file. This is the comma separated file which I am reading. Let me just edit this file. Okay, this is the file which I'm going to load. Fine. File, which file? And I want to load new, select a file which I want to load in my database table. Okay, fine. Options. So my first option is ROC, code page rot. So since we are, we are reading a raw file, employee dot CSV. This is a comma delimited file. So I'm going to have this option data file type for TXT, for text files, we are going to have this option char, okay? So if we are, if we are importing data from a database, from any database, these are different options that we can use. So we can select the native option or we can select the other options, okay, batch size 0. So what is the meaning of this batch size? Batch size means that let's suppose if you have a 100 records which you want to load. So if you're going to give value 0, it means that after loading a 100 records, it will perform the commit operation. Let's say. If you are going to give value 10, so it means debt. After, after loading ten records, it is going to perform the, IT is going to perform the commit operation. Okay, So this is the meaning of batch size, since I want to perform the commit operation once all the data is loaded in my database, I'm going to give options 0, last row in the first row. First row is going to be the header row. So I'm going to start the loading from the second row. So I will give value second, last row. So if you want to load all the data, you are going to give value 0. Let say if you want to load ten rows, so I'm going to give value 10. So in this case, I'm going to give value 0. So these are different options. Here we have options like check constraints. These are different options that you can give. Keep nulls as it is, ok. Now if you want to sort your data, but this is an option here, you're going to specify the column value. Okay? These are different option extra expressions. Now click Okay. So there is an error. You must specify a table names. So in order to specify a table name, you have to right-click. Then you have to select the properties. So from here you're going to select this option destination table name. So here I'm going to provide this table limb. This is the name of table where I want to load my data. So I'm going to provide this table. Now. You have to save this package. You have to double-click this package connection. Look, you can see the destination table, okay? Now click okay, save this and execute the package. It seems there is an error. Okay? Let me check what is the error parameter not okay. So the error is hearing this query, okay, I need to again verify what is the problem here. Table name, okay? I have to specify this table name here. Execute, select star from this table. So data has been inserted in my table destination tables a difference how you insert data in a table from, from a text file. So you can, you will use the bulk insert task when you have two. Then you have to load huge volume of data and you want speed, and you want to load in minimum time. So this is the concept. Hope you understood the concept. Thank you so much and have a great learning.
23. ForLoop Container: Hello everyone. In this particular lecture, we are going to discuss t For Loop Container in SSIS. So first of all, I'm going to select the control flow section. Then I will select the SSIS toolbox. In the SSIS toolbox, I will go into Containers section. I will select the for-loop container from here. Just drag and drop this for loop container. So for Loop Container is just like FOR loop in programming language. It is going to behave the same way as default loop in programming language. So in this particular lecture, I'm going to give you an example of for loop container. So this is my employee table. And let me just query this table. So what I'm going to do with follow-up container is I'm going to extract the event IDs. So time t is the event ID 16 is 128 even id. So I will only extract the even ID, IDs of this employee table. Then I'm going to insert data in my DRG underscore, employee underscore, underscore final table. Let me just select this. Okay? So I will use the for loop container here, okay, inside the for loop container, I will use the data flow task. Select this. Then I'm going to use the data flow task inside the control flow for loop container. So you just at the data flow task care, let me just configure this for loop, for loop container. To configure the loop container, I'm going to define a variable here. So variable name will be, let's say counter. So counter is going to be the name of variable. Scope is this four variable is only limited to this container. Data type is integer, default value is 0. So double-click this for loop container. Now the time to initialize this counter variable, initial expression. I will write like this at, so this is how you define, this is how you write a variable inside default loop editor, counter. So this is going to be the name of a variable. This is a variable equals to 0. Evaluate expression. This is going to be the condition. So I want this for loop to be executed 50 times at counter less than equal to 50. Then assign expression at Counter equals counter plus 2. So since I want only the event rules, even IDs, so I'm going to use this counter value, counter plus 2. So this is the variable. Here. I have to write like this ad, okay, at counter plus 2. Fine. Click. Okay. This is configured. Now next I'm going to configure this data flow task. So inside the data flow task, I will use the only DB source discipline or LED be source. Afterwards I'm going to select the destination. This one is source here I'm going to select this test generation. Next, I will configure this. It is already configured. Table or view or Q weight. This one is the source editor. So what I'm going to do, I'm going to select the SQL command. Instead of using the table or view, I will use the SQL command here. What I'm going to do, I'm going to use the parameter value here. So select star from this table. Their ID equals 2 question mark. So disco stem Mach is the variable value. So here I'm going to paste this discourse. Denmark is going to be the variable value. I will use the parameter value here, which will be the counter. So let me just use the counter. Variable. Discounter. Click Okay. Columns, these are the columns. Next I'm going to configure the destination part. Let me just configured this table on view here I'm going to select the DRG underscore finally table. Now, let me just select UTR G underscore final table, employer underscore final table is this one. Okay? Now mappings, find these are the mappings. Click Okay, save this control flow. You have to save this. Now. Right now, this table is empty. There is no data inside this table. So what it will do, it will only extract, this part is going to extract the even rows. And this OLED be destination is going to insert their time the employee underscore finally table. Okay, now next I'm going to execute this mapping, this okay, package. Execute this package. The package has been successfully executed. It is executing. Ok, now just successfully executed. Now I need to check my final target table. Data is inserted. Look, all roads are even. All ideas are even. Two. Okay, 48, 50. Now let me just truncate this table. Fine. Now what I'm going to do, click here. Now I'm going to change the values here. Counter equals to 0, counter less than equals to 10. Now it will only insert rows from 0 to 10, only even rose from 0 to 10. It is executing. Now check select star from TIG underscore employee final. Look, only five rows, 246810. Okay, so this is how you use default look container. So thank you so much and have a great landing.
24. Foreach Loop Container: Hello everyone. In this particular lecture, we are going to discuss the for each loop container in SSIS. So when we have to process multiple files, we use default each loop container. Let me show you the files which I'm going to process. So I will go in my directory here, I will search my files. Look, my files are present here inside this directory, these two files, so I'm going to process these two files. So I will use the for each loop container. First of all, I will go here. Here, SSIS toolbox. I'm going to use the for each loop container, just drag and drop this for each loop container. Now I'm going to define a variable here. From here I'm going to define a variable. Variable name will be, let say path will be the variable limb. Then the data type will be string. Value of this variable will be the path. Just select any file, shift, press Shift, right-click. Then you have to click Copy as part. After that you have to paste the path here. Okay? So this is going to be D variable value. Okay, fine. Next, what you are going to do, you're going to double-click this for each loop container. Then you have to select collection expression. You have to know, you don't have to select anything. Just select this. You have to select for each file enumerator. There are multiple enumerators. So since we are reading multiple files and we want to process multiple files, so I'm going to select this for each file and numerator. So now in numerator configuration here I'm going to provide the path. So the path is this one. This is going to be the part. Just browse it. And d directory. Okay, let me just paste the path here. After debt, you have to specify which file you want to process. So this is going to process all files which are present care in my, in my directory. So if I'm going to specify any specific file, like I can do like this CFE, it is going to specify it is going to process only CSV files. I want to process all the files which are present in my directory right now I have two files. This option, when I'm going to check this option, what they're pretty do a quick search the folders, okay, if we have a folder here, let say if I have multiple folders. So we have multiple files inside the folders. Then we have to check this option right now, I don't have any folder, any directory inside this directory, any folder inside this directory. So I'm not going to check this option, name and extension fully qualified name only am going to specify this fully qualified variable mapping. I have already created a variable. A variable is this, it is going to map with zinc 0 index. So first one is going to be D. Part is going to be the value of a variable. Okay, now, click Okay, it is configured. Next, what I'm going to do, I'm going to use the data flow task. So what I'm doing, I'm going to process these two files and I'm going to load this. I'm going to load the data of these two files in my table, in my SQL Server table. So next, I will use the data flow task inside this for each loop container. Next I will use the flat file source. Where is flat file source? I have to select the flat file source after death, either choose to. I can also choose the flat file destination. If I want to load data in a CSV file, can use some other destinations. Excel file destination. There are some other destinations there as well. Raphael destination. So since I want to load data in my database tables, so I will select, I will choose this option, oily OLED be destination. Configured, this double-click this option, new flat file manager, I have to create a new thread file manager. Browse, okay, SS, various folded, okay, document files, input CSV, just select any one file. Columns preview, okay, fine. This is the data columns. Fine. Click. Okay, next I will configure this OLED be destination. Right now, I don't have any configuration manager. Create a new configuration Manager click Okay, it is configured. So if you want to create a table from scratch, you have to click new. So since I already have my table in my database, I'm going to use debt table. So my table is I need to check if the table is there or not. I think I need to create a new table. So from here I'm going to create a new table. In the table name will be let say Demo. Demo is going to be the name of table. I'm creating a dummy table. Mappings, fine. It is configured, done and dusted. Save this. So next. What will happen? I have to execute this, this for each loop container. So in order to execute this, you have to click this start. So the process is executed. I need to refresh this. Refresh. It varies the refresh. So now table, table name is demo, OLED be demoed this table. So right now, how many files? There are a 100 rules in my in my table. So let me just check employee 001. How many rows of this table of this file? Around 50 files. So what is happening right now is that let me check the employee 0, 2. So what is happening right now? It should have loaded 63 roles in, in my target table. What is happening right now is tack. This package is loading this phi 2 times. Okay? So what I want, I want to load the debt of these two files. So it is repeatedly. It is loading the data of only this file, and it is doing it two times. So I'm going to truncate this cable. So why it is happening? It is happening because truncate this table. Let me just, okay, fine. It is happening because I haven't configured the flat file connection manager step. I need to go to the properties. And from here, I need to configure this expression section. So what is happening right now? First time it is using this file. Then the second time it is using this, it is taking, it is executing this package and it is loading the same file. So what I want, I want this to load the second file, not the first file. Okay? Now for debt, I need to go here in the expression section properties. I will select the connection string. From here I will select the connection string, various connection string. Know I'm unable to find the connection string. So what I'm going to do with file connection manager properties expression, a connection string expression. Now this time I'm going to specify the path evaluate expression. Now, this time around, it is going to load only. It is going to load the data of these two files for total how many rows? 50 rows in my first CSV file. What about the second row? Look, total 13 rows, it should load 63, 63 rows. Now execute this. Let me just check. Total 63 rows have been loaded. So this is how you use default. Each loop container width. For each loop container, you do not specify the initial condition value, initial and the endpoint value. So here what you're going to do, you're going to instead you're going to specify the, this steric, fine static. So this means that it is going to load all the files which are present aid in the directory. So right now let's say if you want, instead of the database table, you want to load the data in, in a flat file, in CSV file, okay? So I'm going to use the flat file destination. There is flat file destination, this one. So this is going to, let me just delete this. This is going to load my data in my CSV. So update this input. No, I want to create a new file connection manager. Click Okay, s3, select this file. Now, I'm going to change the output. So I'm going to change the name of this file. So this is going to be output dot CSV file. No, I don't want to overwrite the data. Click Okay. Save this and execute the mapping. Execute the package.