Beginners Data Analysis Bootcamp with SQL | Adnan Shaikh | Skillshare

Beginners Data Analysis Bootcamp with SQL

Adnan Shaikh, Technologist

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
70 Lessons (3h 39m)
    • 1. Course Introduction

      2:41
    • 2. Instructor Bio

      0:29
    • 3. Learning Road Map

      1:10
    • 4. Why Take This Course

      0:43
    • 5. What are Databases

      4:07
    • 6. WHAT IS DBMS

      4:26
    • 7. MYSQL Database

      1:15
    • 8. MYSQL Windows Install

      3:59
    • 9. MYSQL Macos Install

      3:54
    • 10. MYSQL Cloud Install

      5:58
    • 11. MYSQL GUI INTRO

      3:19
    • 12. Create Database and Load Data

      4:21
    • 13. Data Modeling Introduction

      2:22
    • 14. Data Modeling Building Blocks

      2:37
    • 15. Data Modeling Tables

      2:22
    • 16. Data Modeling DataTypes

      2:24
    • 17. Data Modeling NOTATIONS

      1:29
    • 18. Data Modeling One to One

      3:13
    • 19. Data Modeling One to Many

      2:08
    • 20. Data Modeling Many to Many

      1:30
    • 21. Data Modeling Recursive

      1:35
    • 22. Data Modeling Normalization

      1:25
    • 23. Data Modeling 1NF

      2:11
    • 24. Data Modeling 2NF

      2:39
    • 25. Data Modeling 3NF

      2:00
    • 26. SQL Intro

      1:08
    • 27. SQL Categories

      1:13
    • 28. DDL

      5:05
    • 29. DML

      4:23
    • 30. DQL1

      4:57
    • 31. DQL2

      5:12
    • 32. Inner Join

      7:36
    • 33. Left Join

      4:03
    • 34. Right Join

      1:00
    • 35. Cross Join

      2:27
    • 36. Self Join

      3:17
    • 37. Intersect

      3:04
    • 38. Minus

      0:56
    • 39. Union

      3:24
    • 40. Aggregate Intro

      1:00
    • 41. Aggregate Functions

      2:01
    • 42. Groupby Part1

      3:20
    • 43. Groupby Part2

      1:37
    • 44. Subquery Intro

      1:11
    • 45. Correlated SubQuery

      2:22
    • 46. Non Correlated SubQuery

      2:09
    • 47. Converting String Functions

      3:30
    • 48. Date Functions Introduction

      2:15
    • 49. Date Functions Part1

      6:42
    • 50. Date Functions Part2

      5:54
    • 51. Exp Ext Man String Functions

      4:52
    • 52. Formating String Functions

      3:14
    • 53. Numeric Functions

      3:02
    • 54. String Functions Introduction

      0:40
    • 55. Conditional Statements

      3:16
    • 56. Window Aggregate Function Part1

      5:57
    • 57. Window Aggregate Function Part2

      7:41
    • 58. Window Function Components

      2:15
    • 59. Window Function Introduction

      1:36
    • 60. Window Ranking Function

      6:15
    • 61. Window Positional Function

      2:26
    • 62. Data Visualization Introduction

      1:47
    • 63. Data Visualization Data Types

      2:17
    • 64. Data Visualization Categories

      1:14
    • 65. Tableau Software and Datasets

      4:02
    • 66. BAR Chart Visualization

      7:47
    • 67. Line Charts

      3:20
    • 68. Histogram

      5:58
    • 69. Scatter Plot

      4:53
    • 70. PIE Chart

      2:32
14 students are watching this class

About This Class

Learn the four key pillars of Data Analysis:  Databases , Data Modeling , SQL and Data Visualization.

This course will teach you the four building blocks of data analysis in a very structured way.

* We will first learn about databases and database management systems with the focus of data accessibility. Our preferred DBMS will be MYSQL.

* We will then learn the foundation of Data modeling for relational DBMS and gain knowledge about how data is related and how the design of data models help in capturing business use cases.

* We will take a deep dive in learning SQL; from writing simple scripts to advance analytical functions to gain insight.

* The last section will focus on data visualization and we will use Tableau software as our data visualization tool with many examples to showcase our story telling skills.

By taking this course you will gain experience in designing data solutions which will make you more marketable and competitive in the job market.

I will try my best to keep this course up to date by adding more relevant lectures and lab exercises.

Transcripts

1. Course Introduction: Hello and welcome to the beginner's data analysis Boot camp with sequel course. I'm a non shake. I have been consulting for the Fortune 500 companies for the past 18 years and have successfully implemented multiple data projects. I was fortunate enough to go to one of the best schools in my field. I have taught multiple I T courses to professionals. I'm excited to launch this new comprehensive course on data analysis. You don't need to have a programming experience to enroll in this class. You can be from any walks off life and will highly benefit from this course. I will explain each topic in detail in my BDO lectures and have extensive live exercises with multiple quizzes to make sure you get a complete learning experience. If you're not satisfied with this class, you have a 30 day money back guarantee. We will learn the four key pillars of data analysis that will become your foundation in understanding how to work with data effectively. Having a basic understanding of how databases work is very important. We will be using my sequel as a preferred database system for this class. My sequel is one of the leading databases for the Fortune 500 companies. Data modelling how data is related to each other and understanding the different types of relationships between data becomes critical. As we dig deeper in driving value for our business data Insight, we will learn sequel language in detail way. Have the lab exercises to access the data in our database way will start from the basic operations and then get to the Advance analytics to address multiple use cases. Data visualization. We will learn the skills to tell a compelling story off our data by creating multiple visualizations using tableau, which will be our preferred data visualization tool. My goal is to make sure you learn these data skills in a way that can help you apply them in your professional life. Get the salary raise that you were looking for. Make your resume stand out and if need be, make a shift to the deep analysis field. I'm really excited about teaching this course to you, and I hope you join me in this learning journey 2. Instructor Bio: Hi. I wanted to share my background in little detail. So you have a better understanding of who your instructor is. Hopefully, this late will give you a quick snapshot off that I have been in the technology space were quite some time and held various roles that led to designing critical data applications. Hopefully what I've learned over these years working closely What data will help you immensely in your data analysis journey? 3. Learning Road Map: Hi. I have put together this road map that will help you understand the blueprint off our learning path. First, we will start to learn about databases to gain basic understanding how to use them in our class. We're gonna be using my sequel as the preferred database tool. You will then start exploring the concept off data modelling, which will help you understand the different relationships between our data and their structures. Then we will introduce SQL structured query language. We will divide the sequel capabilities into different categories when it comes time to defining data, retrieving data and manipulating it and then joining merging and summary ization off it. To further enhance our sequel skills, we will learn the core data functions based on data types, string numbers and dates. Then we will look at some advanced data functions that secret provides. Last but not least, we will be looking at data visualization for our course. We're gonna be using tap low as the preferred data visualization toe. Hopefully, this gives you an entering picture of how the courses curriculum is structured 4. Why Take This Course: hi. Nearly all the job pulling websites. I have seen the old predict that having data skills will be a must in the coming years. We already finding a shortage of resource is with these data skills. Now I keep seeing sequel skills that is required for all the data jobs we have fits an analyst engineer development sector, and having it on your resume will make you stand out. Telling a story through data visualization has become an important part for all data jobs. In summary. What I will say is that anyone taking this course will benefit a lot in making their data scale stand out and be more competitive in the job market. 5. What are Databases: what our data basis. So rather than getting into the technical details, let's let's try to understand how we come across databases in our day to day life. So if you're searching the Web, anything you write if you're searching for a particular keyword, any results said that you get back. That results said, is coming back from a database if you go to an A T M machine and key in your pass code. When you swipe your A T M debit card, the information gets checked against the database. Any time you log in to a webpage with your user name password, that user name password gets checked against a database any time you pull up a bank statement, that bank statement is created from the data that is stored in the database, so that's kind of the way we interact with data basis. So what you see on the slide areas that I have an Excel worksheet, So a lot of folks are still saving data in Excel worksheets. You keep adding, you know, to your data set, you keep adding rose just like a ruin columns, and that's great. And for a lot of smaller companies, maybe Excel might be a solution. And I see people just saving data in text files and companies are producing data 24 7 And with the age of Internet, you know, data has grown astronomically. So what is the problem that we see here like what happens in your excel when the data grows from 100 lines 22 million records? How would excel behave what it crash? Or do you have to split the files and how do you link the files back together? So definitely there's gonna be a problem there. Ensuring the security and reliability of your data in this set up will also become a problem. Database software systems were designed to solve a problem, and I've just listed few off the key issues that the database systems solved. Let's go for the 1st 1 volume. I mean, data is increasing its compounding astronomically increasing and terabytes petabytes, terabytes. And we need sophisticated Softwares to manage that much volume of data securities. Another key issue that companies are facing when their data gets hacked. So we need database software systems to have those security encryption layer added, so we can protect our data. We have Web applications that have millions of customers accessing their data. So we need sophisticated database systems. I can handle these transactions simultaneously. Reliability becomes so important for companies where data is their business. I mean, for of stock exchange auto flight scheduling system, we cannot have data that is not correct. Data has two beautiful data. Is increasing securities needed millions of transactions happening on the same time? Data has to be reliable, so database software's or database management systems are needed to solve these problems. We just need to understand that and get that in your head before we get into kind of looking at the features off the database management soft fears. 6. WHAT IS DBMS: in the previous lecture, we looked at databases what they are and also go through some off the rationales as to why database management systems were created. So in this slide, I wanted to kind of give you architecture. View off how database management systems are used and kind of ah, different layers or components as to how database management systems can work. This diagram is going to help you understand that different components that are used when you access data to a database management system the user application layer. Think of this as where you have a client installed on your laptop. In our case, we installed my sequel Workbench. There are other client tools that can be used to access the database management system as well. So and I have shown toad as one of those Softwares and other reporting software's can be used as well. Your client application needs to have an or DBC connection initiated. 40 B C stands for open data base connectivity, so this driver needs to be in place for your client application to connect to the database management system or your database server. Once that connection is established, you can easily write a sequel script and access your data. I listed view of the database management systems, IBM, DB two Microsoft Sequel Server. My sequel in Oracle Database systems. And there are a lot of other vendors providing this software as well. They're different functionalities that are embedded in the management system. I mean, we have functionality that handles transaction management. We have areas for handling security data manipulation, defining data loading data. And the database server also has an engine that can SQL language to be processed. And when we writer sequel script as simple a select star from a table, that sequel statement gets processed. We other d bms the D. B M s needs to interact with the operating system to get to the data storage area or the data stories layer. The last layer is the data storage area. Your physical data files would reside on the desk. I'm showing customer order and product data is stored on the physical disk. But you would need a database management system to access this data, and you would need a client like my secret work match to be used via a sequel interface. To get that data and be presented to you. These individual components will help you understand how the database architectures set up . And it's very important to understand what the user application does and what the D. B. M s does and where the physical data files are stored. And I think once you can make a distinction, it will really help you understand the database management system. 7. MYSQL Database: Hi, everyone. One of the key points I want you to take from this slide is that my sequel is being used by a lot of industry leaders. If you look at the top left of the slide, these are just some of the companies that are using my sequel in productions of From Facebook to Walmart to uber I'm These are all the companies. Either they're using my sequel for their bare house or their Web applications. And if you look at the bottom left of the slide, I've given, like, timeline kind of the inception off the software, and we was built in 1995 by a Swedish company, and over the course of years it has been acquired by different companies and an oracle, uh, acquired it be a son. So it has hundreds and millions off installation throughout the world, and it's built on some of the key features for a database which are performance reliability east of use and cost. And if you look at the bar chart I've included, I mean it tells you compared to some of the other database solutions, it is so cost effective. So the total cost of ownership TCO is really, really less compared to other software 8. MYSQL Windows Install: Hi there. Great. So let's kick start this installation. So we're gonna be installing my sequel for Windows. It's a very easy install. Just follow the U. R L that I've given are great. So we're on my secret war page and follow the Ural that I've given. We will be downloading the mice equal installer 8.0 point 12. And the first link here, the one that I'm highlighting co operating system is going to be Microsoft Windows for the S. And then click down little do log in or sign up Just saying no, thanks. And move on and you're gonna get, ah, save as, um options and just save as Creative folder and save the file. The installer is going to start, which is 1.4 version. He's gonna ask you to install an upgrade, so just say yes. We will just get the new version as well. All right, so the first thing you're gonna get is the license agreement you're gonna accept, read through and accept that, and they're different options to install a set up type. So we're gonna go to the custom. We just need a server and workbench. So we're going to select the server option moved to the right as a selection when they were going to go to the work bench. Um, get that selected as well. Also, we're going to do the or DBC connection in a documentation. So we're just doing limited software install rather than everything. I've fast forwarded the video, so it should take you no more than 10 to 15 minutes. It next. What? We're going to keep that replication to default. This do next networking type. Just keep it default next authentication method as well, just to a default next. And then just put up a password. Please make sure you remember this password because very important, I just put it in a piece of paper or note. Pat somewhere, it's just gonna be the this service name. If you want to manually start the service this to execute, it's gonna, uh, run a bunch of configuration files again. I have fast forwarded the video just to finish on launching my secret workbench. It gives your default connection, but we're going to create our own. My secret connection just hit the plus sign and we'll give it a name. We'll just say sequel Underscore Class. Everything is already filled up just to Ah, OK, double click on sequel Underscore Class. We're gonna be launching workbench now. So this is the graphical user interface for my equal, and we will, right. All our scripts there will create tables, load data and run different quarries using this GUI interface. 9. MYSQL Macos Install: Hello, students. So we gonna install my sequel on Mac Os. So I've given the euro. Um, you can use to download and go to the my secret website. We gonna be downloading my sequel Community server and also my sequel. Work Bends. So we need these to sell fares for the installation. All right, so we're gonna first install the my sequel server. Just follow the link. I'm click on the community server. We're gonna be downloading the DMG file for the mice Equal Community server 8.0 point 12. So these are different types or files. We'll just do the DMG again. Just click download, ignore the log in and sign up at this time, So just say no, thanks. And click on the link. I fast forwarded the video so you have that amount of metal disc image on your desktops, and now we're gonna do the my sequel workbench Install, scroll down and you'll find one file, which is also the DMG file or workbench. Eight points or 80.12 and click the download. All right, so we have both of files on our desktop, so we'll start with the server file. Click on that so it's gonna start. Ah, the install. Or you would need a password for your database. So please make sure you remember the password right on a piece of paper or note pad or save it toe. It'll need this every time we connected the database. So just please, please, please remember now we should see the installation status as being successful. You know, we have actually installed server. We can go to Preferences and just see my sequel. Um, in the list off software is being available, so we're good to go there. Now we'll click the workbench DMG file and that it install. All right, so let's open the my secret workbench application. Great. So we have my secret workbench installed. Let's click on the icon for the connection. That's already there. We'll have a default connection, can create it at the install. Will just type in the password. That be Ah, we keyed in earlier. So hopefully you remember that he's going to try to connect to the database. All right, so we're successful in connecting to, but my sequel server through Ah, my secret workbench does the graphical user interface we're gonna be using to write our sequel scripts. And, um so hopefully, during the course of the class, we'll go step by step. And you learned this interface for your programming. Thank you. 10. MYSQL Cloud Install: everyone. So we will also try to access my sequel. We have the cloud instance, I'm gonna be using Amazon Web services and, you know, looking into future, I think cloud implementation is gonna be prevailing across a lot of companies going forward . So I think it's a really good to have and experience using some sort of glad service. So we're gonna have the my sequel instance in the cloud, and we're gonna use our my sequel workbench to connect to the cloud Instance. So you need to have an AWS account set up, and I have given a You're left at the bottom for some steps as to how to set up that account to please follow that. And hopefully the set up would be easy for you to create that account. Once you have your count set up, sign into the council and you'll get into the main portal or feet up to us. You you'll see a list of services. So Amazon has a bunch of services from spinning off servers, different storage options date, obvious option, and so we're not going to go through all these details. What we're interested in is setting up my sequel server. So in the database category, we have already. SRT s is a subcategory where we have all the relational database systems that Amazon provides. So now please go ahead. Uh, click on the button. Get started now. So these are the list of relational databases that Amazon is providing. You have sequel server, Oracle posts, Crests, Maria DB and my sequel and Amazon Aurora. So we're gonna go to my sequel. So we have two versions. One is the community addition and the other one is Amazon's Aurora. It's, ah, different flavor of my sequel, and we're gonna be using the Community server edition. They're gonna go click here, Please go ahead. Used the Devon test instance because it uses the free tier so we don't get charged. But we want to see only showed those options that are eligible for free tears that we'll just click that we'll leave everything else here the same. So we need to specify a db instance. So in my case, I'll just use, uh, AMG Inc. Use the name student. I have to give a password like a step so we can leave all this as a default database name will just say my sequel, Cloud. So we'll just leave all this as if back up. We don't need any backups or this one just 10 to 0 relaunch. So this will take a little bit of time to spin off the server. Um, So what I'll do is I'll just pause the video. All right, so now our service up and running and you can see the status. So now we need a connection string or an endpoint. Ah, that we're gonna use in our my sequel work bench to connect to this instance. So we go to instance actions, you'll get some more details on the serving. You also get endpoint. Okay, so this is our my sequel, workbench. And now we're gonna create a connection, a new connection to connect to the my sequel server on the Cloud. So let's go back to our details off the server. So this endpoint is basically our connection string. So just copy this in my secret workbench would create a new connections to click on the plus sign are there will get a small window open so we'll create a connection name, so I'll call it sequel. Underscore Class underscore Cloud as far as the hosting will put the connection string there. We already have the port, so take the port out out of the connection string. So for using him, I had initially put in student Well, just test the connection. Enter the password that you set up. All right. So we were able to make a successful connection to my sequel instance, which is in the cloud. So it'll create a small Aiken for the connection, So just double click that great. So we're able to connect our my sequel workbench to the my sequel server instance on AWS. 11. MYSQL GUI INTRO: Hi there. All right, so now we have our my secret work. Ben started. So? So the first thing you will do is try to connect to a database. So we have my sequel connections. It just gives you Aikens for your connections that preset. So we're gonna use the 1st 1 that we set it up early. If you double click on that, I've already entered my password before, so it's just it's gonna just let me in. But it might ask you for a password. So just key in your password that you set it up. So now we are in the graphical user interface of my sequel, and this is the place where we're gonna be writing different scripts. So we have the quarry pan in the middle where we write a sequel statements. We have an object browser. We're gonna have different schemers and tables listed for metadata information. On the right hand side, we have sequel statements or snippets that we can use for our sequel. So now we have kind of a common understanding about the interface. So if you look at the object browser area, so we have ah s Y s says schema that's created by default, and it has objects underneath it. So we have tables, abused or precision functions. We for our class will be interested in creating tables, loading data enquiring those tables. So if you look at that, we have one table called six. Underscore conf IQ. Let's say if you want to just access that table so we'll come into our quarry pan and just write a simple, select statement. It's Colon. And if you look at this ah, lightning icon at the top, this is execute the selected portion of the script. So just click that and you'll see a result. Great. Which kind of gives you the results that that you need from the stable six underscore conflict. So when you initially load the stable, sis might not be your default database, so just make sure your right click on sets and just set as default schema so you don't need to prefix it with something like this. Leaving this will also work Great. If you look at the right, this is what we initially we're talking about the snippets that my sick will give you. I mean, these are different categories that they have split into the details of the M Els and etcetera. Click on it and just check what the script would look like with all the options. It's just kind of in a handy tool to have when you're trying to right near scripts and, as you can see it has insert joined Select. It's kind of handy in the output. Here is basically kind of a high level summary as to rebrand the script, how much time the script took and the record second returned account of those records. And if any editor comes into scripts able to be listed here as well, great, I will see in the next lecture. 12. Create Database and Load Data: great. So now we have that My sequel software installed. We have my sequel, Workbench up and running. So now we need to create a database. So this is the fun part. We're gonna download a sample database that my sequel provides. You're gonna follow the link. Are so we're on the my sequel website. Your scroll down, You will see the Secura. You can pronounce it different way. We'll just call it Secura. A database. Just download the zip. We'll just do receive. As so we have a schema. That's where we're gonna have the sequel statements to create our table structures. Then we have, ah, data, which is gonna be a lot off insert statements creating this Secura database. And then we have the data model off how these tables are structured and related. All right, so we're gonna do a file we're gonna open Script open, secure a schemer sequel are So we have our schemo that's equal file loaded in my sequel workbench. So it has a bunch of creates statements creating actor table, address, table. So we're going to be creating all our data structures through this script, and it also gives you additional information on the columns that we're gonna be adding. So we'll just run this script all at once, and it's gonna create all the tables we need for our exercise. Make sure the cursor is on the start off the script line and then click on the execute icon to execute the script. So just hit. Hit the, uh, hit that I can. You can ignore the warnings that were using the output. It's still going to create the tables. So if you go to the object browser on the says I can right click and refresh, you'll see the Secura database. And also, if you open the tables like on, you'll see the list of tables that was created to, say Select Star from the actor table Click Execute to run. If you see we did not get anything back, there's no denying it. So now we need to load data into the state of eso are simple to school. Open, open script. We will select the data dot sequel file and load it. So this is sample data that we're gonna be loading were the sequel insert statements to fill up our sample database. We will execute the scripts by clicking on the icon. So these insert statements will take a few minutes to load for data, and you can see the status in the output pan as to how much time is taking and how many records of being inserted. Let's try to see we have data and into one off the table. So we'll just to run the same select star from actor table and see if we get data. So if you seen the result great, Um, the data is populated in the actor table for the first name, last name and a bunch of rose. So we're good to go here. Same way we can try to run the address table. Select star from address. We should be able to see more data there. I will see in the next lecture. 13. Data Modeling Introduction: everyone. So now we're gonna start a new section on data modelling. Understanding how data is related to each other becomes really, really important as we start our journey into the data analysis field. So we have to make sense of how data sets are related to each other so we can derive value off the data. The data becomes more meaningful if you are able to relate the data with each other. The by product of data modelling is a data model that gets created. It's a graphical layout. It becomes a communication tool that can be used for different groups. To understand how the data is related to each other, we can use my sequel work, bench or urban software to do data modelling. You must have seen a picture of her data model. And what I'm showing on the slide right now is just a sample set off tables in some of their relationships. Connecting lines between the tables. Show the relationship and we will go into detail in the later slides. We ran the schema file to create the table structures. Already, we loaded the data already, and now we're gonna be opening up the data model, so there's a dot m w b file last file When you unzip, So just double click on the file, it should open the data model in the My sequel Workbench. If this window shows, just go ahead rename and I'll go ahead, load the model, click on the ER diagram icon, and then the model opens so they have color quarter different sections based on the type of data there is. Give customers and in military and all the related tables in certain subject years and color coded it. So this is the interface where you can you, the data model and how it's related. 14. Data Modeling Building Blocks: tables are one of the foundational structures in data modelling people's. So I've given an example off a country table. Now we're gonna be talking about the building blocks for data modelling. 1st 1 being table or entity. You can use thes names interchangeably. A table can be something for which you want to store data for or collect data for examples can be customer. You want to collect it about a customer, its behavior. It can be automobile, the type of cars that are going on a particular hive. A. It can be products, so anything that you want to collect data for second column or attribute. These are just characteristics off the table. So if you want to collect data for the customer, the customer being the table, the columns for the customer table can be first name, last name, phone number, address, etcetera. Third is data types, so each of those columns can have a certain type of data type data. Dave. All it is is that if it's a phone number, we have a number as a format off the column. If it's a name and it's more off a textual data, so just the different flavors off those columns. Fourth, his carnality of relationship is what defines how the two table or entities are linked with each other and just kinda. His motive for descriptive way off, seeing how they're related with each other. And we're gonna go into details. Different types of carnality, ease and relationships they are present. Fifth is the primary key is kind of what defines the customer. Is it? The Social Security number is kind of the primary out into fire, and then foreign keys is a concept that will become apparent as we go down, explaining the different type of relationships and how tables are linked. So the's six building blocks are very important to understand how data is related to each other so well, go and explain each of these building blocks in detail in the following slights. 15. Data Modeling Tables: tables are one of the foundational structures in data modelling. As the definition states, it's election off related data. Wait, so we were talking about if its customer, we want to have a customer table. So it's a structured format in which databases create these peoples. So I've given an example off a country table. Let's go try to look at eat section off the table so the stuff that's highlighted in orange are basically the attributes or columns, so we have country I d. You have country and then we have last updated column country idea sting individual unique . And then the country is listing a textual information. And then there's a time value for last updated. So these air columns or attributes, if you look at the bottom section of the slide, created that in green. Those are actual values or data elements, so data elements are If you look at that last role country I. D. 99 Austria. Each of these individually or data elements if someone says, how many columns do you see in this lights? I would say three country i d country and last update data is added as rose or two poles So if you look at horizontally country I d to Algeria and the last update value that row becomes one record in the database. Another important concept is off primary key. Each of these rolls that you see are identified by a unique identifier. So country I d one through nine is unique. 123 up to nine. So you don't see that being repeated. Understanding the way tables are structured is very important in defining the relationships that we gonna learn in the following slides. 16. Data Modeling DataTypes: we will be talking about data types or, in other words, you can say, is the type of format that a column has in the database. So whenever you try to insert a data, we need to know if it's a date. If it's a string, what if it's a number? So for her date column data birth In this example, the format is first. Why, by why? Why is the year Portion Dash MM is the month and Dash D D is the days of 2000. Dash 10. Dash 10 would be the format off a data element, and the second date format is a date where we have hours, minutes and seconds as well. So for certain scenarios, we can have the date feel defined in a more granular way, so date would be the first data type. Second data type is strengths, so strings are more or less, you know, in a case of customer table, you know, we can have customer first name, last name, anything textural, and then we have two variations. Lark are variable character and the character. We don't need to go too much into detail, but they both hold string values the third category of data types come under number. So any time we want to store numbers, you know, for instance, your car's mile like we were talking about an automobile table. So if you have mileage for the car, that mileage column can be stored as a number. So for number you can have a data type, often indigent or a small, and depending on how big the number is or if you have decimal places we wanted to present. Like, for instance, the customer paid amount. Like if you paid for a Starbucks coffee and the amount waas some dollars and in some sense , so we can represent that by a data type off decimal. If you know that data types and the category off date strings and numbers, you can do majority of the data analysis knowing thes data types. 17. Data Modeling NOTATIONS: it will be talking about the notations that we will be using in our relationships before we get into the different types of relationships. I just want to make sure we understand the language in which we gonna be depicting our relationship. So we're gonna be using information engineering notation so that the process of designing the tables and it's relationships is called entity relationship modeling. So if you look at the carnality, that's just the maximum number off connections you can have between the two tables. So one would be just one word Icka line. Many would be a good crow's feet. Modality is just the least number of connections. You can have a relationship zero or one when you see it. Relationship in action you will have both the maximum and the minimum number off relationships those two entities or tables can have. So that's why if you look at the right hand side sharing a carnality and modality together . So the first relationship is showing one or many, and that's why you see the symbol for one as a war tickle line. And for many as a crow feet, the one at the bottom is zero or many, and you have a circle and a crow foot, so just getting an understanding off the symbols were really help you in the following slides. 18. Data Modeling One to One : all right now we will talk about different types of relationships that two tables or entities can have. 1st 1 is 1 to 1. As the name suggests, it say's one entity can only have one relationship with entity to an entity to can only have one relationship with Entity one. So if you look at this example, we have a relationship between a manager table and an office. If you just look at the manager table, we have Manager I. D. First name, last name. So therefore, different managers in our table. And if you look in the right hand side, we have an office table. So we have four different offices. Chicago, New York, Seattle in Columbus. So destiny that we're dealing with. If you look at the manager table, we have Manager I D. As the primary key. And in the previous lecture we redefined what a primary key is. It's a unique got into fire. Some Manager I D one only relates to Eddie Manager. I D four only relates to Paul. If you look at the office table, we have primary ideas as office I d one through four and then also we have another column manager I D. In the office table. So what it shows is that Office one is managed by Manager One on location in Chicago. So Manager I d. Being in the office table. This particular column is a foreign key because it is present in the manager table. So now there's a apparent relationship, and if you look at the line that connects these two tables, it's a straight line with two dashes on the side. So what that depicts is that the manager can only manage one office. If you look it from the office to the manager side, it say's office can be only managed by one manager, so it's a 1 to 1 relationship. It's a mandatory relationship as the modalities also showing one with the word ical bar, and the maximum relationship is also one. And another example can be U. S citizens and Social Security table. So U S citizens staple, which has a first name, last name and then another table as a Social Security, a citizen can only have one. Social Security and a Social Security can only be related to one citizen. So hopefully that gonna clears that up. As far as the relationship setting, So it's a 1 to 1, So in this case, a manager manages one office in an office can only be managed by one manager. 19. Data Modeling One to Many : all right, so the second type of relationship is one to many relationship. This is one of the most common relationships that you will find between tables. If you look at the example, it's between customer and order table. So you just logically think in a scenario where a customer can have multiple orders like If you go to Amazon, you can. We can put three orders. So that's legitimate business case, right, so you can have multiple orders. But that individual order, if you put three orders that it order one or the first order only relates to you. None of the other Amazon customers. Rights of the order is linked to one customer, but the customer can put multiple orders. And if you if you go with that mindset, if you look at the relationship in the middle between these two tables, we have a vertical line on the left meanings that the order can have only one customer on the right. We have a crow foot depicting the many portion of the relationship with the word ical bar, depicting at least one so the customer can at least have one order or many and again we have a primary key and a foreign key. So the primary key on the customer table is the customer I D. And the primary key in the order. Tva's order i d. Great. But the order table also has a customer. I d in it for this relationship to work and that customer I d in the order table is a foreign key, and that customer, I D. Is what that relationship is built on. Majority of the tables would fall into this category a game. So if you look at this example, customers can, but multiple orders and an order can only be placed by one customer. 20. Data Modeling Many to Many: third type of relationship is many to many relationship. Let's look at the example that we have the first stable. We have a student, so it just gives you a list of students that we have. So student I d. First name last name. Great. So we have a student table. Then if you look at the far right, we have ah, class table. He just gives you the different classes that are being offered. So English math, chemistry, etcetera. So as long as student and class tables by themselves look fine, student has to new information class has class information as we relate the student and class data together, there's an apparent many to many relationship between them, since the student can take multiple classes and each class can have multiple students enrolled in it. We have a scenario where this relationship can capture other information like student grades, enrollment dates and for this will be better to make a junction table called enrollment that will capture this information. If you look at the way student is linked to enrollment, we have, ah, one to many relationship where a student can be enrolled in one or more classes and If you look at the relationship from right to left from class to enrollment, a class can have one or Morrell enrollments from the students. As you go forward in the data analysis field, you'll see more and more relationships would fall into this category. 21. Data Modeling Recursive: four type of relationship is called Recursive Alien Ship. And if you look at the example of employees Table, so you have employee I D. First name, last name. You also have 1/4 column called Manager I. D. So the managers air also taken from the list off employees. So there's a relationship between Manager I. D. An employee, I d. So if you look at the bottom half of the table, we have a relationship going from manager I d. To employee i d. So a manager can manage multiple employees. If you look at the tailing off the relationship, it's a crow feet saying many and at least one employee can. We managed by one manager or in some cases may not have a manager at all. So that's the other side of the relationship. And that's why you see employees one not having any manager i D. And we look at Mike and Jay employed two and three, they're being managed by Eddie was an employee idea one, and nobody's managing Eddie. He's a top level employees, and that's where you see Manager I D has no information. So this is an example off recursive relationship where we have a relationship within a single table and manager I d is a foreign key to the employee column. So hopefully this was helpful. This type of relationship is rare to have, but it's good to know when doing data analysis. 22. Data Modeling Normalization: So now we have kind of looked at the building blocks of data modelling. Know what what a table is. What are columns, attributes. What is the primary key or a foreign key? What our data types and how tables are related in types of relationships that can exist between tables. Understanding normalization becomes important as more and more data comes into your database. So what normalization is is a process in which we try to break a bigger table into smaller tables, and what it does is that it helps you reduce redundant data or duplicate data. Normalisation goes through a few steps. In each of these steps are called normal forms. We go through first normal form than second and then third. An overall normalization if you look at some of the benefits, is that you save the storage. It's easy on maintenance for the database because you're updating less data. I activity gets improved and also it helps query and reporting. I will show you an example often employed table, and we're gonna take that table from first normal form to second and then third normal form , and you will see the transition as to how we go through this whole process and hopefully with this example, the normalization process off a table will become easier 23. Data Modeling 1NF: we have been given this employee table in our tasked to normalize it up to the third normal form. Let's take a step back and look at the data closely. We have an employee I D, which uniquely identifies employees. A department code that is unique for a department. We have employees name department, three columns that have full numbers in them, employees start date and employees vested indicator. So that's the data that we're dealing with, and we have to take the stable through the normalization process off First, normal form up to the third normal form. When you're dealing with the first normal form, you need to ask two questions. Are they any repeating groups? Meaning Are there two or more columns that are closer related to each other? If they are, we need to create a new table for them. Second, make sure that all the attributes are single. Valued attributes. Let's take a look at the employee stables. If you look at the employee name, we have first name, last name in one column That's our first candidate, as we should not have any multi valued at tribute. So what we do in the first normal form. We split the employee name into the first name and last name to take care of the multi valued. Attribute. If you look at the first phone number, all the phone numbers there are different and has to be the employee full number for number two. It seems like that phone number is dependent on the department. If you see finance, that's Finance Department has the same number. So it is the department for number four. Number three has all the phone numbers being the same. That can be looked at as a company for number employees, start date and employees vested indicator can move as this. We also improved the metadata or the column names for all the 34 number columns and add employees department and company labeling to them. As it makes more sense now, our employees table is in first normal form 24. Data Modeling 2NF: Let's get that ball rolling to the second normal form. The most important question you want to ask is when you're transforming your table from the first normal form to the second. Is that to make sure that all non key columns are functionally dependent on the entire primary key? So what are the non key columns? Employees, last name employ, first name, department employees, phone number, department, phone number, company phone number, employees, start date and employees vested indicator. These are all non key columns. So what the rules say is that they need to be functionally dependent on the entire primary key. So the primary key here is employee I. D and Department coat, right? Those other unique got into fires. But if you look at the employee first name, last name that is dependent on the employee I d. An employee I d should be able to give us the employee first name, Last name. If you look at the department department is dependent on the department code. So you see, we have some columns in the table, depending on one part of the key, and the other columns are dependent on the other part of the key. What we need to do is we need to split the table where the columns are dependent on the whole key. So that's why you see the employee table being split and having the employee first name, last name, four number employees, start date and employees vested indicator. They're all dependent on the employee I d for department. You see, we have Department Department phone number and the company phone number that are dependent on the department code. Great. So we have those two tables transform into the second normal form Right now, we also need to capture the relationship between the employees and the department. Data right. Employees can work in different departments, and the department can have multiple employees. As you recall from the earlier lecture, to resolve a many to many relationship, we had to create a junction table. So in this case will be the employee assignment where we gonna have an employee, i D and a department code as being the two columns in the relationship. So we look at the relationship from employees to employ. Assignment is the one to many, and if you look at from department to employ, assignment is also want to many. So how are you going to capture the relationship and have the employees in department being in the second noble form? Great. 25. Data Modeling 3NF: sorry, let's get the ball rolling for the table to be in third normal form. We should not have a situation where an attributes depends on another attribute that is not the primary key off that table. It's also called transitive dependency, very important rule that the non key columns in the table should not have dependence between them. Secondly, we should not have any derived data, such as like total columns that I drive from other columns in the table. So if you look at the employee table, we have employees vested indicator that is dependent on the employees start date. So for a certain start date, that indicator either turns yes or no. So that is definitely one candidate that we can address in 1/3 normal form. So if you look at the employee table in the third normal form, we only have first name, last name, full number and start date. All those columns are only dependent on the primary key, and there's no dependency between the non key columns. If you look at the department that is already in the third normal form as you go through different projects and and as you go through your data modelling assignments, you'll find that the third normal form is where majority off the transformation stops. There are some higher normal forms, like boycott normal form, fourth and Fifth normal form, which do occur, and people do use them. But very rarely, I will try to add some slights for those normalization at a later time. All I'll say is that with practice data, modelling becomes easier and easier. You need to understand the business as to how the data is captured and what the data means . Practicing the normalization rules and being close to the data will help you in your data modelling projects. 26. SQL Intro: Hi there we went through learning about databases and then data modelling. And now we're gonna be learning about. SQL Sequel is a programming language used to interact with the Relational Databases Sequel is an act, Um, for structured query language. The Syntax off sequel is really easy to learn. It's pretty descriptive. A lot of key words that are used to do the operations are self explanatory. Different relational database systems have some flavor off sequel that they have rolled out , and each flavor has some functions that might be different from the other version. But in all the basic sequel, syntax remains the same. We will be using sequel to in track with our My sequel database. We're gonna be going through different categories in which sequel statements can be placed . Secret was standardized by American National Standards Institute and see for short in 1986 and it has gone through multiple revisions 27. SQL Categories: we can put sequel commands into different classifications. The 1st 1 is D D L data definition language. So whenever you're trying to create a table, alter it, drop it of these sequel keywords can be categorized within DTL, and we have a supper slide where we're gonna go through examples for creating de de l's. Second is D M L D M L Statements pertained to either inserting, updating or deleting data, and we will go through some examples of those. Third is Dickie Well, data query language. And this is thesis Elect keyword. You must have seen a select statement in the past as well. So this is basically a generic sequel statement which can help you retrieve data. 4th 1 is D. C l data control language. The statements in D. C. L are normally used by database administrators like granting privileges or revoking them. We won't spend time doing data control language examples, but we will go through examples that will fall under D d L D M l and equal 28. DDL: All right, So now the fun begins, we're gonna be writing a sequel code, so we're gonna go through D D. L statements. As in the previous slide, I was saying, It's the data definition language. We will be creating a table and then also we can either alter drop it or truncated. So let's let's take the first example in creating a new table. All right, so we'll be creating a new table called students, and we're gonna be using a keyword called Create Table and then the name off the table, which will be students Open bracket and close brackets will list different settle columns. You stink of columns as attributes like, what kind of data do you want to capture for the students? Okay, you have your student i d. His first and last name date of birth phone number at is just a general information about the students. So whenever you want to add a column, you have to have a data type what defines the type of data that will go into the column. So, for instance, student I d. Reuse i nt as indigent first name last name is gonna be very well character war car for short, it's gonna be string values. Anytime you have a string ready, just use a very real character or character you can use either or one is fixed length. The other one is rebelling. And whenever you defining a string, you will basically give the length off the string rallies in this case is 20. Date of birth is gonna be a data type of date. Phone numbers again is gonna be a string address is going to be a string. And if you see address, I have kept it at 100. He might have more information on the address. We need to add a semi colon to indicate the end off the statement. So just highlight this portion and execute. Great. So now we have successfully executed a create statement. So if you go under the tables section off the database Secura, we need to refresh tables and do fresh all. And then you will see the students table. You can open the students table hair and drill down into the columns and you'll see the list of columns that we created. Great. So now you have created a student's table. All right, so now we have created the students table. Let's try to add some columns to the table and drop some, and we can even go ahead mortify a data type. So if you look at the older table syntax so we have the altar table as keyword, we have to use altar table and then the name off the table that we want to alter, right? So it's gonna be altar table students, and we want to add address, line and address to as to new columns. So we're gonna use the ad key word and what's after give data type. In this case, I'm gonna be giving 100 length for each column. It's gonna be comma when we end. Would be then using the drop key word to drop a column. So we'll be dropping the address, Call him. And if you want to modify a call and we just use the word mortify last name and we're gonna be changing Initially, this string data type was off length 20. So we're gonna be using now, 30. So now we can just run this, highlight the whole statement and just run it. Great. So are ultra Statement was executed successfully. We can come down to the object browser and look at our student table. So we have student I d. First name, last name, phone and address. So it's actually showing the older emits to just do it a fresh all. And if you look at the last name, it's a variable length 30. Great. So now we have altered the table to drop her table or to delete David. Suppose we just say drop table and the name off the table. So what this will do is it will drop the student table and then we also have a truncate table statement. If we do truncate table, it basically empties out all the records and the tables. Right now we have our student table is empty, so it won't perform a particular action because of the day with his empty. So we run the drop command and as you see, the student table has been deleted 29. DML: the next category in which we can put the sequel statements are D. M. Els. So data manipulation language is so we can create statements to either insert data update data or delete data from the table. So we'll practice each one of these statements and see how the get a base behaves. All right, so we'll be inserting a record into the country table. So let's see what we have in the country table already. Every great. So we have a bunch of records. We have about 109 records in the country table. So if you want to insert data into the table, do you have to use the keyword insert into, and then the table name open brackets and you after listed columns, Country table has country I. D. Country and last updates. We'll list those three columns and then we close the brackets. So the first set of columns are gonna be the number of columns center in the table, and then we have the values as a keyword. Then we list the data values that we want to insert into the table separated by commas. Go for country I d. We're gonna put 1 10 If you look, we already have the last record in the table as I d being one or nine. So we'll put 1 10 will put as a country name as my land. And for the last update, this is a function. Don't worry about it. Just you can just put it, as is it say's current underscored time open and closed bracket That's just gives you the current time when you insert the record close bracket. So insert into table name list the columns that are into table than the values keyboard open bracket and then insert the data and separated by a comma. All right, so let's execute this. Okay, so we have affected one droll if you look here at the bottom. So let's try to select data from the country table and see if we got our record. And so let's scroll down, says you can see the record with Country 81. 10 was inserted. Great. Now we want to update the same record and said the country named to no man lands. So we used the key word update the table name set country. The column that we want to update equals two. And when you put this in Ah, Coats, No man lands. We use the where keyword country I d equals 1 10 So this is gonna filter the data. This will filter the data to the record, which we inserted That was 1 10 and this will The set keyword is gonna update the value in the country column for the record where the country ideas 1 10 So we run this. See, it has been updated to no man's land. Great. So if you want to delete a record from the table, we used the key word delete from table name and then we use where is when we want to qualify and get to a particular record into the table and filter. So where country idea is equal to 1 10? Let's run this. So now if you do so like star from country, we should not have any record with country I d. One time. So that has been deleted. Great 30. DQL1: we will be looking at the most common sequel statement that is used to get data from the database. It's ah, it's a very simple statement, but it's very powerful. At the same time, I have tried to put this visual for the select statement. I just wanted to make sure that you understand each component off the select statement. A lot of times people probably no select statement partially and they missed the other details. So hopefully this picture will help you understand each component off the select statement that is used to retrieve data from the tables. Select selects the number of columns from basically less the tables. You want to get data from where is basically where you put the filter condition to filter your data order by is where you list the order in which you wanna see the data and limit is where you wanna limit the number of rows you get back. All right, so let's get this kick start. Its the first script we have is select star from actor, so we cannot retrieve data from the actor table. We're using a semi colon in the end to end the statement we're using the select and from keyword, the star or ass trick depicts. You want to bring back all the columns from the table. So if you don't I mean other ways you have to list. Let's say the table has 20 columns. You don't want to list each one of them. I let the script and run, so you'll get the data from the table about 200 records in the table. Now let's see if you just want to bring back first name and last name, we do select. We just list out the first name, and columns will be separated by a comma. First name, comma, last name from keyword and an actor and semi colon. So we just get those two columns. It's pretty straightforward and self explanatory. Let's say if you want to label the first name as employed name, you have to use the keyboard as and then just used the column name as the given label name from the table, and you have employees name and the rest of the data comes in. Great. Unless say, if you want to order their data order by first name, so we'll still do the same. This piece of the select statement. It means the same. We have to select first name last name from customer. We used the key word order by and which call him do you on order by his first name Ascending. So let's run that. Great. So you see the data comes back and is ordered by the first name of sending and let's take an example off a payment table. Let's use the payment table and order by amount descending so amount should go down when the data comes back. So you see, first we have a higher amount that it goes down. Let's see if you want. We can also add a sort on two columns, several first sort by the first team so we can sort the customer data by first name, last name together as descending by just using order by first name Coma, Last name descending. It's run that so great suit is ordered it. Now let's say we want to bring just the 1st 5 records from the customer table. So this is gonna be a traditional select star from table order by customer, I D. And as you can see, what 600 records come back and say We have to use the key word limit and then give the number of records we want and then Colon. So let's run this. We only get five records by. So this is by using the key word limit and then the number of records you want back. You can also pick arrange when using the limit. Keyword. So if you say to comma five, that means that it's gonna bring five records, but it's going to start from the third record off the table. 31. DQL2: This is a list off sequel comparison operators that you can use when you're trying to filter data from the data set. So whenever you use the keyword where you can use a particular column and then if you want the column to have a value equal to something or not equal to something greater than less than so, you can use thes mathematical operators, we also have operators like in open bracket and close that gives you the option to list multiple values. Also bit mean gives you an option to go pick a range. Rieff. It's the range in the date or the amount null. And not now is where you have an absence of data and you just want to pick the value where there's no data in the column like is also very popular operator, which you can use to big data that follows a certain pattern. You will be using the where clause to filter the records coming back from the database. We already know how to select data from my table, right? So let's highlight the portion where it's a select star from customers, so you will get pretty much all the records from the table, right? So let's say if you want only pick Mary, you use the where keyword falling by a column named that you Want to filter on equals Mary has to be in single Coates is a strength string value and then semi colon. So let's run this. So we found one record in the customer table, which satisfied this condition. If you want to pick customer data where the first name is not equal to marry, you just use explanation mark and then equal together and then marry. So if you run this, you will filter out all the records where the first name is Mary or let's take an example. We want to apply to conditions to filter the data. So we using the payment table. We want to pick all the records from the payment table where the payment date is created than 2006 Jan. First, and the paid amount is greater than four. So we have to apply to conditions to filter the records. We'll do select star from payment where payment date will use greater than the set date in single coats and amount greater than four. Let's run this so you'll see all the payment dates I created than 2006 Jan. First, and the amounts are greater than four. Great. Let's try to filter the last name off the customer from a given list of names. We can use them not in open brackets and give the list of those names if you run this So all the last names, a return back and Davis, Miller and Wilson records are filtered out. Let's take an example where we want a filter that payment records based on a date range. We can use the between key Vered and give the starting date the single coat and the ending date. So the records that have returned or a payment date between these two dates, we can also use the order by payment date. So let's run this 2005. July 5th is in the range of 2005. July 1st 2000 Julys 31st. All these records are in the month of July. We can also filter customer data based on a pattern that we find in a string value so we can use the key word like, followed by a single coat and the two characters off the first name in this case, a N, followed by a percent sign and close coat. Let's run this. It's returning all the first names where the 1st 2 characters are a n. Likewise, we can run the same like command, but have the percentage in the start, meaning that the last director would be. Why in this pattern. So let's to run that you look at the first name. The last character is why in all these records, and this is a very powerful operator, great. 32. Inner Join: one of the tasks that data analysts do is bring data together from different data sets, and that is achieved by joining tables together. The first type of joint we're going to look at is called inner join. So if you look at the Venn Diagram Table A and Table B have something in common, and that common data element will be used as a key to join these two tables together, the syntax for the inner join can be in an implicit way on an explicit implicit is basically when you're using the inner join keyword. So let's take an example off the first syntax. Select from table a inner joint table, be on, and then we list the keys from both the table that we want to join on. That's kind of an implicit way off saying it. We can also use instead off on. We can use the using keyword and just list the key that is common between those two tables . Explicit ways where you say from table a calm a table, be where a dot column equals B dot columns. That's just a different flavor in which you can get the same results you can open the data model from the file itself. Click the ER diagram Aiken to open the model when joining tables is really important to know what the data model looks like, Right? So let's say, if you want to join country and city, we want to know which column the relationship between these two tables are built on. So what column In country table In what column? In the city table, the two tables can be joined on, so it's gonna be select Country and City column from city table inner join country table using Country I D. As the column that is common between both those tables where countries equal to United States. So whenever you're trying to join the tables, you need to look at the data model rights of what data model will give us the relationship off, how tables are joined together. So let's switch to the tab where we opened the data model. So we're looking at the country and the city table, and if you can see we have country I d common between those two tables, so that means that we can use the country I D column in our joint condition so we have used the country I D column as the joining column between city and country table and we also using the keyword inner join. And what that means is we want to bring back only those records. We're the country. I d has same data between those two tables, and we want a filter where the country's United States So what the sequel will return is the list of all the cities in the United States. You're given the task to find the list of films. Language is not English, so we're gonna go back to our data model so Phil and Language Table are joined on language I D. So let's go back to our sequel. So select. We're gonna select title and name from Phil Inner join language table using language i d. Where name? Not in English. So we want to bring back all the films where the language in which the film was made is not English, and we don't have any records right now on the database that satisfied this condition, we can modify the script to bring back films that were made in English by taking the not out just having the name in English. Let's say we have a scenario where we want to generate the list of customers which live in Barcelona. So let's go back here. It will be a three table joint customer address and city. Let's look at our sequel Select. So we're selecting a bunch of columns from from these tables from customer inner join address using the address I D. So if you look, this is how the custom an address are joined on address I D and then we're joining address inner join to the city table on city I d here. So they're joining on the city i d column. So this will be our joints for three tables where city equals Barcelona. So you see the column City to get to that, we had to go from customer to address to city to relate all this data together. Now we run this. We should get all the customers living in the city of Barcelona. So we have one customer that lives in Barcelona. So this is a very powerful joint statement, and as you can see, you have joined three tables on these common data elements. So let's say you want to bring back customers that live in Barcelona, and during their rental period they have paid more than $5 in rental cost. So you will be joining the payment table with the customer table. So if you go back here so there's a customer, I d their joints, the customer and the payment table right there. So you're joining so the rest are the same. We already did this joint, so we're just adding another payment table, and then we have an amount greater than five. So let's run this. So if you see we're getting records for the same customer. So where he has two transactions where the amount was greater than five. So it's giving you a transaction history for the customer living in Barcelona, where the payment amounts are greater than five. 33. Left Join: the second type of joinus called left join. So left Shown brings all the data from Table A along with the data that matches between A and B, the syntax four left joinus select from and we used the key word left joint table a left join table B using the common key fields between those two tables. So we want to get the list of staff members and indicate which one are managers. But the list should also give you staff members who are not managers as well. So for that we will be using a left join. We need to insert these two records so we have enough data to make this example work. So if you come back to the model, we have staff and store table linked by the staff. I d. The table store has those staff I ds that are managers. And then, of course, we have the relationship on store idea as well, So we will be joining the staff table and the store table with a left joint. That means that is gonna bring all the records that are in the staff table along with the ones that are in store and we're gonna join on staff. I d equals manager staff I d. So this should give us all the records in the staff table along with those staff members that are managers as well. So if you see first name, last name is coming from the staff table. So we have four staff members right now in our database manager staff ideas coming from the store table. And only Mike and John are managers. So for Johnny and Jimmy, there's no record in the store table. That's why we have no means, no data. So this way you can see that left joints are useful for certain scenarios toe, where we want to bring all the data from one table and link it with the records that are matching between the two tables as well. Art Another variation off left join his left outer join, meaning that I just want to get the records in table A which are not present in table B. So what you can do is that you can do select from table a left, join table B using the key that is common between the table. We're it will be his keys equal no. So if you picked this last option where table bees key, which means this table equals no, you will get all the records in table A. So let's say, if you want to get the list of staff members who are not managers, we'll do a left outer join and the wavy achieved that is we joined staff to store with keyword using left. Join on staff I d. Equal manager Staff I d. We're using the own keyword in our joint condition. So the column names in the two tables are not spelled the same way. So till here it's exactly the same as a left join. But when we add the very clause, the staff i d from the store table is no. Then it only returns those records from staff table and not the ones that are matching between those two tables. So if you compare this with the other left joint, this is only bringing those staff members that are not manager great 34. Right Join: third type of join is a right joint. Basically, it's the divers off the left. Join and syntax would be select from table a right joint table. Be using the common keys. It brings all the data from table B and links the records between Table A and B. We also have right outer join, which is the opposite off left outer join. So select from table a right joint table. Be using the common keys. Where nine. The previous one when we had left outer join We were using table bees G as no. But in this one, we will use table a key as no. So it's just a reverse off left outer. Join for the right joint and the right outer join. I don't have any lab exercises for you. Was that something that you can try to practice on your own? Using the Secura sample database 35. Cross Join: four type of joint is called Cross Join is basically trying to join two tables that don't have an apparent relationship, and this type of joint is also called Cartesian product. One of the things you have to be careful about is that you can do cross shine between two tables if the number of records in both the tables are pretty small. I'm talking about 5 to 10 records and maybe 20 records or so, and they're normally court values with some descriptions. So it's like if you have a Pacific use case where you have to create a some sort of a mapping a code mapping between two tables, then you can use cross join the syntax for this is select the list of columns from Table A cross Joint Table B. So it's going to show one value of table A being repeated for each value and table B. That's why the number of records get multiplied by the number of rows you have in Table B and A So let's look at an example off across showing so would create two tables won t shirt , underscore color. The other T shirt underscore size. So one has number of colors and the other has the number off sizes T shirt can have, so we'll just run. Let's look at what data we have in T shirt color table. So in the T shirt, we have two records. One is black and one is white. Let's look at the T shirt size table, so we have four sizes small, medium, large and excel. So now if you want to find out the maximum number off combinations between the color and the T shirt size, we can join the T shirt color table with their T shirt size table into across join. It's going to give us the maxim possible combinations that apostle between the colors we have and the sizes Great. So with black, you can have four sizes, small medium logic cell, and with white, you can have four sizes as well. 36. Self Join: fifth type of join is a self join. You have to create an alias off the table. So let's say if the table is a you have to create an alias, a one and a two to join the table together. Self joint, normally our rear. But they are some use cases in which you have to use a self joint. For instance, if you have an employee table and there's a column in which you have a manager, I d. That manager ideas pointing back to the employee i D column in the employee table. So the same table has an inherent relationship within the table itself. So let's go to an example and see how this can work in practice. So let's say we want to get a list of managers and their direct reports. So for this example, we have to create a table called employee Manager so Discover had created table. Let's insert ah, number of records here. Great. So we have six employees and their full name and the manager 80. So, for example, employees one Matthew is being managed by grace and Grace is being managed by Alice and is a relationship between the table itself. So let's take a look at this self joined sequel. So let's take Take it piece by piece. So we're creating an alias for the employee in the school manager Table E one. We doing the right join employees in the score manager E, too. So you're doing a right John between E one and E two on employee I d think of the one as the manager list and think of E two as the employee list and we're selecting E two employees full name, which means we want all the employees to be less dead. We just giving a label as employees. So all this statement is doing here is that if no meaning, that if the manager value is now meaning the employee does not have a manager, we label it with no manager. And this is just a label to give the column. So if we run this, you'll see we get all the list off employees in their corresponding managers. If you look at the last row, Alice Cooper does not have a manager. Let's look at just the employee table again. Alice Cooper, the manager. I DS No. So when we run this. It found no and it replaced it with no manager value, which we give here so self joins our rear, but it's good to have the knowledge about them. 37. Intersect: second type of set operator is Intersection. So sad, one intersex said to He's gonna only bring back whatever is common between those two sets. So my sequel doesn't support the Intersect keyword, but we can get the similar results by using exists or in operator. So again, we're gonna have curry. One exists. And then we're going to use the quarry to so inquiry to We're going to be joining the two tables together based on calling one and column, too, and maybe run. This is going to only bring back the records that are common on those two columns. Similarly, we can use the in operator in the in operator. We don't need to physically join the columns. We can just do a look up. Think off in as a look up. So so select Colin. One column to from Table A where so in the brackets, we give what columns we want to look up. Call the one column two in the second quarry, and then we select the columns that we want to look up to. In my experience exists performs better, would data and in his little bit slower from a performance perspective. So we want to get back data that set one and set to have in common. So if you look at the version one off the script So we're seeing Select Star from Sat one where X and Y are the columns of set one in and then we have open brackets and clothes and select star from Set to So this will bring only those records that are common pretenders to sets. So as you remember, set one had four records and set to had to records ever present and sat one. So we should get to records back. Great. So we have Eddie and Charlie that comes back. We can also run the same command using the exists key word with exists, we have to reduce a joint condition. So we'll select star from sat one. I'm giving it an alias. A Where exists? Open brackets select star from set to alias Be where? So this joint conditions on column one and column to needs to be present in the second quarry to link it to to set one. So this physical linkage needs to exist when we using this type of sequel when we're using in. We did not need it. The joint condition. So that's kind of the parents. Syntax difference, but exist is faster when there's a lot of data, so we get the same result great. 38. Minus: the third type of set operation is minus, basically set one minus. That, too, will give you elements that are inset one, but not inside, too. If you look at the example on the slide set, one has won 4 10 data elements and said to has 9 10 11 If you do the minus operation there , you will get one and four as a result, just as a mathematical operation off subtraction, my sequel does not support minus keyword, but we will still achieve the same results by using, not exist or not in keyword. As far as the syntax goes, it is the opposite of what we did earlier for the Intersection lecture. So instead, off exist, we will use, not exist, and instead of in we will use not in for the lab exercise. I've attached the sequel scripts to the lecture notes, which will help you go through the minus like operations within my sequel 39. Union: in sequel. We haven't area where we can actually do set operations. I mean Set is is a branch of discrete mathematics. But when it comes to databases and sequel, you can perform set operations by using the keyword union intersection and minus my sequel supports Union, Onley, intersection and minus can be mimicked by some variation off the sequel itself. So if you look at the union operation set one has won 4 10 data elements and said to has 9 10 and 11. If you do set one union set to, we're gonna get all the data from sent one and set to, and there won't be any duplicates. So if you look at the lab bottom portion of the slide, you have quarry one union quarry to, and one of the things you have to be cognisant about is that column one and column to in both quarries need to have same data type when you're doing the union operation. The columns that you select inquiry one need to match the columns from quarry to when I say match. The two columns need to have similar data types between the two quarries. Also, you have to make sure that the number of columns between the two Coreys are the same, and the order in which the columns are listed is also the same and also have similar data pretending to each other. If you use a slight variation to union and use union all, it will not remove the duplicates, so you will have duplicate values if you just to union. All So it's a simple operation, which brings data from two tables together and a pence data from Cree one to Cory, too. We will create two tables, set one and set to, and we will insert some daytime to those two tables. So let's see what we haven't set one, right? So we have four values Eddie, George, Charlie and David. But see what's inside, too. So we have Eddie and Charlie. So let's say we want to bring data from both the tables Eddie and Charlie isn't set to. So said to is a subset of set one. So when we run the union statement, so the first Cory Select star from Set one and the quarry to is select cell from set to we should get all the records from sub one as their common and said to as well. So let's run this. We should still get the same value for records. Great. But let's say, for use union, all we should get six records. Even two of them are duplicates to see. We get Eddie and Charlie. 40. Aggregate Intro: aggregating and summarising of data is at the core of data analysis. To be good at data analysis, you need to understand how aggregation works. I'll be introducing a new keyword group by that is normally used with a lot of aggregate functions, and we will go through a lot of examples where we're using Group by Klaus Group by can also be used with the role of keyword. All this will make sense when we go through the examples in our lab, so we have a list of aggregate functions. If you look at the first stable on the left, we have average count maximum minimum and some. These are some of the common aggregate functions that you can use on the data set. If you look at the table on the right, these air statistical functions that you can apply on numerical data in your data set to derive standard deviation and variants 41. Aggregate Functions: So I have put together this diagram to just show you how the aggregate functions when they're applied to the data would look like. So this is just an imaginary table with customer I D and payment column. If you see on the right hand side the syntax to actually applied the aggregate functions on the payment column. So it's select some open bracket and the column that you wanna some on and then closed bracket comma. And likewise, you can put average men Max comma, and then count one is gonna give you the total number of records in the table from the given table. So once you run these aggregate functions, you will get a summary record, and that's some record is gonna add up all your payments, average the payment, give you a minimum maximum and account off the record. This is a quick way to get your summary statistics. So in this example, we're not using group by clause yet. We will use that at a later example. So if you look at the payment table in our database, it has an amount field. So these are all the amounts that customers paid when renting the movies, so we want to get the minimum payment maximum average total, and then also the number of transactions we have. So we're gonna use aggregate functions and it should give us one record. Let's run that. So if you see, we get one record back summarizing the data in the payment table. So the amount field has the minimum payment of being zero maximum human being $11.99 average total some off the payment and then the total number of transactions we have, which are 16,000 and 49. So this is a quick way to just get a summary data from a table by using these activated functions. 42. Groupby Part1: If you look at the diagram, what the group by does is that it partitions the table based on the customer i d. So if you look at Partition one therefore transactions. That customer one did with 2030 30 and 40. So that partition is going to get aggregated, and you will see one record for customer i. D. One same thing with Partition 23 and four. So the group by is basically isolating and aggregating each customer's data. You look at the syntax. There's a new addition to our previous sequel, which is a group by customer I. D. And also, we have added that in the select piece of the sequel as well. So we're adding a customer i d comma and then the aggregate functions that are going on to the Payment column. This is a very powerful sequel statement that is used number of time in multiple projects. So if you get to understand this sequel, it will really help in your data analysis activities for this lab will be creating an employee table to just run this create statement and the insert statements, and he should be able to see an employee tables record. So we have i d employee name, department name, salary and age. So this is the data that we loaded right now. So we want to see the total salary per department and also you want to see the number of employees in that department. So to get that data, we need to group by department name, right? So that would be our partitioning field. Whatever group, by calling we use, we have to use it in that select as well. We're going to some on the salary label it as total salary, and then we're gonna do a count, and this count is gonna go per department and that would be the total number employees. So let's run this great. So we see that for each department were able to get the total salary and were able to get the total number of employees. So such a simple group by statement has produced really important information that can be used by the business. So let's take another example, keeping the sequel the same as the previous one will just add. That would roll up key word after the group by department. Me. So what this does is that It adds another record apart from what we already get from the group by and gives us a total summary record along with the group buy records. So if you see the last record, should added up all the salary column and added up the employees column. So the total salary is 44,000 and the total number of employees are 11. So this roll up gives us a summary record that gets added to the group by results so it can come really handy when needed. 43. Groupby Part2: So if you want to filter the group by results, you can use the keyword having and then the group by column, and you can use an operator to filter the records. So if you see the example, we're saying having some off payments greater than 1 20 So we're using the having clause. It's gonna Onley bring back those customers where the some off their payments is greater than 1 20 So if you look at this table to some off the payments for customer for is greater than 1 20 The rest are less than 1 20 so that record is gonna be returned. So having keyword can Hughes to filter group buy records further if I just run that piece what we did before. So we're going to get a job production, service and sales with their total salaries in that apartment and the total number of employees. So now, if you want a filter, this result set to only show those salaries where the sum is greater than 18,000 Onley production records should show, so we'll use the having keyword and then do some salary created than 18,000 so only production record should show Let's run this great so you can see that having key word can be used to filter group buy records further. 44. Subquery Intro: sub queries are like nested queries. So you have a main query and then a sub query. There, two types of submarines, one is correlated and the other one is known correlated sub query correlated sub. Khoury is dependent on the main query in such a way. Therefore, every row in the main quarry, the sub Correa gets executed at least once. Non correlated sub Caries are independent off the main query and they get executed once we have a list of predicates that can be used between the mean and the sub. Khoury exist, not exists and in our some of the common predicates that we can use and we had some earlier lectures on that additional set of predicates are any all or some and we'll see some example of those and some competiton operators that could be used as well. Hopefully, this wizard will help you understand the building blocks off, setting up a sub query routine. We will do some lab exercises in which the concept off correlated and non correlated sub queries will become much more clearer 45. Correlated SubQuery: So let's look at the first example off the sub query. We want to get the list of customers names where the rental duration is more than five days . So they had rented the movies for more than five days. So for this information, we need the customer table and the rental table. So Rental Table will have information about the movie as to when the movie was rented out and when it was returned. And the customer table has the first name last name. So we'll start with select first name, last name from customer table. So that's pretty straightforward where we're going to use the exists keyword open bracket and then we will start our sub query. So this is our submarine. So we're gonna select star from Rental and we're going to give it an alias are where and so this is the Joint Customer. I. D. Is common between rental and customer table, so this makes this sub query correlated Quarry date. If gives you a different between two dates, it's just a function that my sequel has and will go into date functions at a later lecture . But for now, just just think of this function, returning the difference between these two dates and it will be greater than five. So let's run this. So this gives you the list of customers who rented the movie for more than five days. Let's take a second example, but you want to get the list of movies that are not available in any stores. So we need to tables for this oneness film and the other words inventory. So here, we gonna use, not exist. So we're going to say, Select star from film. Give me all the films not exist and then your sub query Select star from involuntary. So Inventory Table has a column film I D. So we can join the film to the inventory table. One film I D. So it stays, not exist. So this will return me all the films that are not in the inventory table so you can see there's a bunch of movies that are not in any stores 46. Non Correlated SubQuery: so let's taken example off. Sub Caries are not correlated. So the 1st 1 is We want to get a list of all the payment transactions that are above the average payment amount. So if you look at the script, select Star from payment, where amount we can just use the operator greater than and then we can use the sub query and then that we just say select average amount from payment. So we using the same table in the sub query that returns the average amount and our main queries saying is that select star from payment where amount is greater than the average amount that gets returned from the sub query. So let's run that. So this list gives you all the amounts that are greater than the average amount. The second example is where we want to find out how many stores are in Woodridge. So our main queries select star from store where address in. So we're using the key word in and reusing the address I D and the sub Caries select address I d. From we're using the address stable and joining it with the city table in our sub query. So the sub query itself has to table that we're joining on City I. D. Where cities would rich. So the sub query returns us all the address I DS that exist in Woodridge. And then the main query is pulling data from the store table where the address I D is in this list to Let's run this. So we have one store that is in Woodridge, and you can check them by profiling the individual tables as well. 47. Converting String Functions: we will be looking at the converting category for string functions, and the list of functions are asking Cast in sound X. So let's go to our lab. So let's say if you're given a task to bring back those customers where the last name is Smith. So we run this simple, select like star from customers where last names equals Smith. So at the first glance, we can say we don't have any records where the last name is Smith. But what if the data quality in your table is not good and somebody is fat fingered the last name? Usually there's some spelling mistakes, so you will, at first glance, say we don't have any customers with. The last name is Smith. But this is where one of the string functions come handy in used to sound X function. What Dysfunction does is is gonna bring back that data that sounds like Smith. It's pretty neat, so we can run the same select select star from customer where so used to select star from customer statement, where Sound X is the name off the function open racket and the name off the column that we want to check equals sound ex again and what value we're trying to check against. So it's going to compare the sounds off boat off these data points. If you run this, you get to records back where the last name sounds like Smith. And when you have a lot of data and let's say you're trying to find a particular name, this type of function will help you return names that sound like a particular given name that you have been tasked to find and will help you in profiling the data in an effective way the second strippers way. We wanna actually use the cast function, which converts data from one data type to the other. So we have select 2010. 10 has value, so we're just using this tax value, and then we a casting this text value as daytime. Basically, the function cast uses the value that we're trying to convert, and then we use the keyboard as and the data type that we want to convert it to. Just giving a label has cast value from duel. Dual is ah, Demi table in my sequel that can be used to test your functions so as you can see that the cast function converted the tax value 2010 10 into a daytime format, so casting can be used when you are ingesting a lot of data that is not standardized where you want to cast the data into a particular former. It's a very powerful function that gets used a lot as well. And the third script, we will be using an asking function and ask you Functions can be used when you're doing data conversions. So some use cases might ask you to drive an asking values if we run this script. So basically, this is your last name column, and it converts it into an ass key value. I've also given you to ask you value finality screens, which is zero. 48. Date Functions Introduction: date functions. You either love it or hate it. Date functions become really important when it comes to data that's time sensitive. As data comes from different part of the world, date format becomes an issue as you can see from this picture that date formats in different regions of the world is not conforming to one. If you look at the blue colored countries, we start with the day d D dash month MM dash year. So that's kind of a common format. But if you look at the yellow countries, we have the year first. If you look at the countries and red, we have the month first in the date format. The day, month and year portion gets inter changed as we look at different parts of the world, so understanding how to format a date field becomes really important, and I can emphasize more that or the course of years. I come across so many occasions where I had to deal with formatting the date data, so hopefully this picture can give you an idea off the different date formats we can get when we're dealing with data that's more international. Sequel has a lot of powerful date and time functions. I put together this diagram for you to understand the anatomy of the date and time format as it is using the databases. So if you look at the left off the bar, we have the date portion. It's a four digit year, two digit month and two digit day. If you look at the right side of the bar, that's the time portion. So we have two digit, our two digit minute, two digits. Second and then we have milli seconds. That could go up to six digits. If you're able to understand this format inside out, you really have an edge on the other. Analysts. Hopefully, in our lab exercise will go through multiple functions that deal with either the date portion or the time portion off this field. 49. Date Functions Part1: below is the set of functions we're going to go through in our lab. The set of functions in the lighter green color, basically our in built date and time functions that give you whatever the current date is. The second set is basically, if you want, extract a certain component off the date back from the data field. The third set, which is the orange color, basically gives you the time portion that you want extract from your date and time field. And the 4th 1 which is highlighted in yellow, is basically a formatting function lets you format the date in two different flavors. So let's go to our lab exercise to try these out. So let's take a look at script once we're using the in bill, date and time function that my sequel gifts. So now let's run this script So current underscore. Timestamp gives you the date and the time. And remember, we went through the format off the date portion and the time portion off the date and time . So this kind of mimics that now also gives you date and time boat together. Current date just gives you the date, so it's the year portion, then the month and then the day. So it's why, Why, why four wise two m's and two D's current time gives you just the time portion. So these functions are good to remember when you're actually doing your application development as well. So let's look at script to so we're gonna be using functions that can extract either the year portion, the month or the week or the day portion off a date and time. So let's run our second script. So if you see we already used the current underscored date function before, so he's just going to give you the date portion, so we have 2018 10 20. So if you say a year open bracket and then used this current date inside the bracket, what this will do is just extract the year same thing with quarter month, week and day. It's pretty self explanatory, and these functions unnamed pretty well. So let's say if you want to say day and in the bracket you used the current underscored date, it's just gonna pull that. Today's the 20th day. Same thing with week, month, quarter and year, and these functions become really important in reporting when business is asking you to split the date and tell you what the quarter is, what the month is, what the week is for their sales data so really, really important is really good to know these functions that can drive and isolate the year , quarter, month, week and day. Very helpful. Let's forget script three. So now we're using functions that can extract the time portion and dissect the time down to our minute and second. So now we're gonna use the current underscore time function to give us the current time and then these additional functions that will drive the hour, the minute and the second same thing what we did with the current date, we're going to use the hour, minute and second and then in the brackets, we're gonna give the current time. Did you look at the current time? It's 11. 50 to 58 so it takes the hour out the minute out and the seconds and you know these functions will become important. If for some compliance reasons, you want to know the minute off your transaction when it was paid or you want to know the hour or you want to know the second thes granular attributes become very, very important. Now we're gonna look at the current time stamp, which is the date and the time portion together. So let's run this. So if you see the current time, we have 2018 10 20 then the time portion, so date underscore format can be used for date and time values. So let's say you have your date and time value right here. Right now, we're using our current time stamp, but in production systems you will get an actual date and time value. So you need to place that hair first and comma. You see these letters percentage K percentage. I ask d M. And why these are pre defined letters that depict like percentage K depicts. It's an hour I depicts is a minute s second D day M month and why year? So this is something that you just have to remember. Just make sure you remember that date underscore format function takes two parameters. One is the actual need in time, and the other is actually what element you want extract when you look at the results said were able to extract the hour, The minute, the second and the day, month and years. This is an important function as well. They say you get the current date and time and you're asked by the reporting analyst to tell you what week it is. What day is it? Not just numerically, but from a textual perspective is a Saturday. If it's a Monday, how would you go about driving that? So let's take a look at this script. So we're using the date underscore format, and we're gonna use now if you look at these pre defined letters there, so percent W will tell you the week day name percent m is gonna be month name percent. He's gonna be day off a month percent Why is gonna be the year, then the same portion ages, our eyes minute and species am or PM. So you see, we got the date and time here then, So we're seeing it's a Saturday. The month is October 20th is the date 2018 and then the actual time. And then if it's a PM or in AM so as you can see this derived date and time is more descriptive and sometimes for your reporting you would need that. You would want to convert a date and time from this format down to here. So just remember that these are pre defined letters that are used in the sequel function. That's something that you have to remember, and hopefully, as you practice more, these functions will become very, very easy for you to understand. 50. Date Functions Part2: the following set of date functions either help you add or subtract date and or time from the date field. These sets of date functions become important when we have more time line oriented data that's needed. These sets of data and time function helps you move along the time line, either by adding days, months, years or subtracting them to get the desired result. Let's go use them in practice. So let's look at Script one there. A lot of times when we are asked to either add or subtract a certain number of days from the date fields, this activity is really important. As we're trying to report a certain metrics, you might come across some use cases where you might either have to subtract or add days or months to a particular date field, and it might be for some reporting need. So let's look at the script, select current date, and then we're adding five days to the current date and then just giving a label. So let's run this script so you see, we added five days, asserts 2018 10 25 to add the days you need to use the keyword interval and then the number off days you want to add, and then the green off. What you want to add. In this case, it's the day. Do you want to add? Let's look at the next example. We can achieve a similar dissolved by using the ad date function as well, so add it takes two parameters. The first is the actual date field that you want to add to comma, and then we can use internal five days, so this should give us the same result as well. So it's added five days. So if you asked to take days out, we can use the negative sign and see Interval 10 days it will take. It will deduct 10 days from the current date. Soc It's 10 10 like heading days weekend. Either add or subtract months. So let's say, if you want to subtract 10 months from today Associates 12 17 12 20. Likewise, the year portion. We can also take 10 years out of the date. It's giving us 2008 pretty nice handy functions. So now let's apply these date and time function to our database, so we have a payment table. She has a payment date column We wanna list out the month and the total number of transactions for the customers in that month. So let's look at the script so select payment date as paid dates. So that's our field in the table Month name is gonna extract the month from the state field and actually extract. If it's January, February, March and then we're doing account, we won number of transactions from payment. We have to use the group by so you want to group it by Beeman Date, but only the month portion off the payment date. Then we just ordering by the count. So we want. Basically the first record should show the most number of transactions, and it should go descending. So let's run this. So you see from the results that July has the highest amount off transactions, then August, June, May and then February. So looking at this data, the store can actually see that July's are mostly the busiest time for movie rentals, and they can actually add more. Resource is in their stores, in which one are the least busiest months, and this can impact the business directly, so knowing the date functions can really help business decipher the type of data you have in your database. Let's say if you're asked to produce an average number of days, a customer holds a movie for viewing. So to calculate the average number of days a customer holds a movie first we need to get the difference between the return date and the rental dates. How many days? Actually, the customer holds the movie, so the first script just gives you that. You can see that these are the number of days a customer's holding the movie now, and we're using the date day function to calculate the difference. Now to calculate the average, we just used the average function. So the average number of days a customer holds a movie is five. If you want to know which day the customer rants the most off the movies. So we want to find out which day off the week customers are renting the boast movie. So let's look at our script for this select day name from rental date. It's just gonna give you if it's a Monday Tuesday Wednesday, a count the number of movies rented from rental again we have to grow by the rental date. So if you look at this data, Tuesday seems to be the day when we have the most amount of movies being rented in a summary daytime functions player Really important role in your data analysis journey. Time after time again, you will come across situations where you have to use the date and time functions. So my advice would be is to go through these exercises a few times, use different data and try to get used to these date and time functions, I think. 51. Exp Ext Man String Functions: string functions can also be categorised under expression, extracting and manipulating categories. And we're gonna be trying out some of the functions in our lab exercises for these in a lot of data projects. As being an analyst, you'll come across different scenarios where you're given a task to either manipulate the data or either extract some value out of the data. So these are the functions that you can use to get your desire results. Let's try some of these functions in practice, so let's take a look at the script one. So it's asking us to find the position off character. Are why, in the first name off the customer table, we're gonna be using a string function called in String Short I N S T R. Which gives you the position off the character you're trying to find in the string value. So let's look at our script. It's a select first name comma. I n str best the name of the function Open bracket first name. That's the name we wanna find, where the character exist. Drama are why that's the character we're looking for. Close brackets from customer. I'm using this in the where clause to bring back only those records Where the first name does have are why, as this particular function gives you the length So if the length is greater than zero, that means that the first name does have are y in it. So let's run it. So if you look at the results, set the first record Mary to the position from which are y starts from is three. So I m a r So are is at the third position. If you look at Cheryl, it starts that the fourth position are y right here? So this is a string function that can give you a position off a particular character in the data pretty quickly. So let's look at script to under extracting. So we want to use left, right and mitt functions to split the phone number that we have in the address table. So let's look at the script. Select phone, comma left, open bracket phone, comma three. So that means that we're bring three characters from the left off this value, and we're gonna give it a label s area code. They will do mid. It takes two numbers. We'll start at the position for and up to the third characters after position for would be the mid phone and the right would be We need the right most four characters. So let's run this. So you can see from the results that we got back is that 998 came into the area code A 34 came in the mid phone column. 1275 came in the right phone. We look at some manipulating string functions and script. Three. So let's say if you want to reverse your first name, it's pretty simple, and we just use the reverse function and give it the name off the column. Let's run that So you can see can easily reverse the string value just by using the diverse function. We can also use the replace function to replace the value in the data. Let's look at the script. Select first name comma. Replace the name off the field, comma Linda to Lin, and we'll just give it to label from customer where first name is Linda. So what this does is that it finds Linda in this column and replace it with Bill in. Let's run this. You see. Linda is replaced by Lin now, so it's a quick way to replace string values. The last script is basically you can repeat data multiple times by using the repeat function. So let's run this. As you can see, the value is Eddie, and it's get repeated twice. These are just some examples in which I'm showing you how a string function can be used. Based on a particular business case, you can use a combination of all these string functions and actually get the desired result . 52. Formating String Functions: formatting string functions. Air used a lot when writing sequel scripts that lets you manipulate string values. Having a knowledge of formatting string functions will definitely give you an edge. Let's go to our lab section. So let's take the first example were tasked to convert the first name and the email address to lower Case. So the script. We have first name common lore. That's the name of the function open bracket and then the name off the column. You wanna change the case to and then lower case email as well? We're just gonna limit to the number of records being five. So if you see the first name was all upper case in the database, so we converted it to a lower case. The email also gets converted to a lower case, so it's a pretty simple function. The second script is where we're gonna be using a trimming function. Trim function removes spaces from the value, so let's look at this group. So the first column recon coordinating the first name. Come on. Yes, if you see one, 23 So you have three spaces after the first name and then we're doing the length. What's the length of this. Once we had three spaces to the first name that we're going to use the trim function. Right tram. First name. So it's gonna trim the space is in the end of the column, and then we're going to see the length off the final value. So let's run this. Let's look at the first records. Marries length is four plus three its length of seven. But when we use the right trim function, it trips. The space is from the right side of the characters. So here you can see we have used to trim function that trims the spaces from the right of the third script is where we can be using ah, padding function. So let's look at the script. Select first name, last name. Our pad means right pad first name comma. So the length of the total output needs to be 15. Third parameter is the actual patting value. And in our case, we're gonna be using a dot and then he also going to write pad the last name. So let's run this. So we had the first name last name. So first name. The length of the name is four. So therefore characters m A R y. It adds 11 dots, so the total length becomes 15 and that's what we have. We're qualifying it with the length that we need for this field to be at, and then we give what we need to use for the padding value. So beat the source column, the total length off the desired value and what the padding value needs to be in coats. So this is a function that can be used to format a lot of string values. 53. Numeric Functions: sequel allows you to from a lot of numeric functions. When we look at our database, we have we might have numeric feels like quantity paid amounts or a particular price. So any time you see in America data, you can apply thes numeric functions. These functions can be characterized in mathematical functions and or rich thematic operators. So if you look at the first staple for mathematical functions, we have functions like floor round sign. It's pretty simple to apply these mathematical functions. I'll show you in the lab as well as the arithmetic operators for division, multiplication and addition. So these operators come really handy. Where we were trying to either add up in America data or whenever we're trying to apply business rules on these numeric fields, these numeric functions become really, really important. Now let's see these functions in practice. Let's take a look at our first script. So we're applying an absolute function short A. B s open bracket, and we're giving the value negative two and we're giving a labour last or how we want to see the column name Mama Seal again Open bracket and we're giving the actual American value floor round and sign, and we're not giving a particular table. So you can also just use thes functions by themselves without giving a table, and my sequel will calculate it. So let's run this now. The absolute negative to brought back to which is correct. Ceiling brought back 31 because it's the upper number. Floor got 30 which is correct. Round got 20 because we're rounding the number from 20.2 to 4 to 34 to 20 and sign is basically giving you what the sign of this value is. So it's pretty simple to apply these mathematical functions, and you can try out your own variations. Put some different America data here and see how the functions behave. The second category was Earth Matic functions. In the second script, we're doing division. We're also doing subtraction and addition. It's pretty simple, and you can run these with your own radiation off numeric data. The result. One is one result to his one. We were using a D. I V instead of the slash for divide as well, so both can be used to give you the same result. Just a negative sign. In a positive. She's pretty simple, so hopefully you can try out some examples on your own and and try out both the mathematical functions and the automatic operators. 54. String Functions Introduction: being able to effectively use string functions become really important in your data analysis. String functions really come handy when you're trying to clean bad data. Having data in a certain format becomes really, really important for companies that get audited on a yearly basis, and string functions can be used to get your data in a particular format. We will go through each of these categories and I'll show you examples of how each function can be used. These functions might look overwhelming at first, but I will try my best to go step by step in explaining how they can be used. 55. Conditional Statements: conditional logic and sequel can be applied using a case statement. It's similar to an if then else statement. If you've used any other programming language, Gates statement can help you transform data from one settle values to the other, and also it can help you prevent enters. If you're doing some sort of a calculation. Now, let's look at the syntax. We'll use the keyword case to start the case statement. Then give the particular column whose value we want. Evaluate when the value, then whatever value we want to use to transform that we can have multiple conditions by using the Wen and then keywords else. If none of those conditions are met, we can give a particular data value a point and then end. Case else is optional. Let's look at the case statement in practice. Let's use the case statement on the film stable. Let's see what we have in the table first, so we have a title in a rating. So what we want to do is we wanna basically expand on the rating acronym and give a description. So what? We can uses a case statement to actually enhance thes acronyms and let our users know what they mean. So if you look at the first script we're using the case keyword, then we're giving the column. We want to transform, and in this case is gonna be rating when and then the value of this column. So if the value of the column is our, then we're going to change it to restrict it. So these are multiple conditions when and then that could be applied to change these documents to more descriptive information. And then you use else if none of these conditions are met than else, condition is gonna be triggered. And as and we're gonna give it a label for this column as movie rating description and also ordered by Title 31 order it a sending. So let's run this. So looking at the results that if you CPG I had a condition where of reading was PG, then transform it to parental guidance suggested. So you see, this new column is actually giving us to transformed value, So this case statement can become really, really important when you're doing migration from one platform to the other, and there some requirements where you want to change values based on a particular business requirement. So you see, all the actives have been successfully transformed. So there's another flavor in which we can use the case statement. The script that we ran. You were using the column after the case. What we can also do if you look at our second script, is case when the column equals R, then Transformers to Restricted and Harry listing the column in each of these conditions and then assigning a particular transformation ruled to it. Both of these will give us the same results that hopefully you'll find these examples helpful. 56. Window Aggregate Function Part1: the following categories give you the list of functions that can be used in the rendering logic so you can use aggregate its ranks functions for statistics, distribution and also positional activities. So these functions become really handy. Once used via the rendering logic, we're going to start with using the AC create functions in our lap. Now we'll start recreating the employee table and inserts some data, and some of you might have already done that in the previous lab. Or so we have employee I D. Employee names the department that they work in their salary in the age. So that's the data we're working with. So let's start. But finding the total salary per department, right? So, you know, from the rarest lecture, we can use a group by department name to get the total salary by summing it up here, you can see the group bike loss, has grouped each department and summed up the total salary. Let's say you want to calculate the average and the total sum off all the salaries in the table, presented in a way that the actual number of records in the table still show when you run the query for that, you're going to use the over clause with initiates the rendering process. So we're gonna do the aggregate function off some passing the column that we want to aggregate on than the over claws. And this give it a label. Same thing with the average. We're gonna give it the column name over clause and the label and just run the script. So, as you can see, we got all our records in the employee table. Employee, I d. Employing name, department name and salary. So we have 11 records showing up. But if you look at the last two columns, the total salary and the average Sally's toward this over Klaus dead was summed up all the salaries of all employees enlisted it at every dro, along with the average salary at every row. Our third strip is asking us to sum up the amounts for Marion Barbara, where each amount they paid is greater than $5. So for that we have to join the customer table and the corresponding payment table. So we're gonna join those two tables on customer I d. We still using the where clause where amount is greater than five and The name is Mary and Barbara. If you look at our select statement, we want to know which month they paid these amounts in your summing up the amount and then using the over claws and then giving a total payment. So let's run this here. You can see the Mary's transactions in June, she had two transactions that were greater than $5. In July, she had three transaction. But if you look at the total payment, the or function is adding up all these amounts and showing it at each row, some off all these amounts together. So this kind of shows you the aggregation being shown with the detailed records are four. Script is a little variation from the third script in scripturally we were not grouping or partitioning the win doing function. But here we're gonna introduce ah, partitioning claws and we're gonna partition by each customer. So we gonna partition by Mary and then we're gonna partition by Barbara. So the script is pretty much the same. The only difference we have is that when we some the amount and use the over clause in the over claws open bracket, we're going to use the partition by Klaus, and we're gonna partition by customer i d. Let's see the results if you see now, so the total amounts are aggregated by the customer. All of Mary's amounts have been totaled up, and all of Barbara's amount have been total up. This partition by Klaus comes really handy when we wanna partition on a particular column using the rendering functions, you can see the total amounts here $41.94 only relate to Mary Smith's transaction When you some these up. Great. I've added a slight to explain these results, further calculating the total payment amount for each customer, ensuring their detailed records alongside it. So if you see we some the amount and use the over claws and then used partitioned by customer i D and labeled as total payments. So if you look at the last column using the partition by Klaus created two partitions in our results that one for Mary and one for Barbara. And by using the partition keyword, you're able to isolate your aggregate functions to be only applied on that partition. Hopefully, this Richard was helpful. In 1/5 script. We want to calculate the total on average. Sally per department. Enlist the total employees per department as well. You had seen the employee table. We had employee named Department and salary and their age going to use the or clause we can also use the partition by. We're gonna partition by department. So? So here. You can see the 1st 3 records relate to the HR department. Salaries are here, So the total salary per department. So there's 12,900. Is this some off these three salaries in the HR department? Because we partitioned by department name. Likewise, average salary per department and the total employees, so we have three employees. 57. Window Aggregate Function Part2: Now let's take a scenario we wanted, classically it. They're running some off the paid amounts per month by customer, where the each transactions are greater than $5 for Marion Barbara, so the script remains the same. The only portion that will change is that we're going to add the order by paid month clause after the partition. By custom righty. So what this does is that it further segments out the data by month after it has already grouped by the customer i D. So let's run this. If you look at Mary's transactions, we have months 67 and eight. So we have three months in which the transactions air split. So for month of June, this winnowing function added up the two transactions for the month of June. Then it came and added the transactions for the month of June, plus the month of July, So 35.95 is adding transactions for all the month of July and June. The last transaction for Mary for the month off August is $41.94 so it adds $5.99 to the $35.95 we calculate in the previous role to $45.94 is the total for all these amounts for all these months as well. It's a running total as it keeps adding the transaction amounts as the current role moves down. If you don't specify a row and range and give a particular boundary, that default is range between unbuttoned proceeding and control. Let's try to understand this through a visual aid. So when we use the renewing function and do the order by payment month, if we don't use any range or don't set any boundaries by default, my sequel will process this as a range claws and then do between unbounded, preceding and current row. You look at the results set if the order by value is the same in this case for the month of July via three transactions range would calculate the aggregation for that block of data together. And that's why you see $35.95 being repeated for each row for the month of July. As the calculation happens on that logical group, hopefully, this ritual will help your differentiate between a row and arrange calculation Now let's update the previous script and add another rendering function where we're actually explicitly seeing rows between unbowed and proceeding, and the current after the order by clause, and leave the other running total payment range as is. Let's run this so the running total payment rose. What it does is that it adds the amounts row by row, like a physical row gets added to. The total amount does not matter if the month is the same. But if you compare the range running total if it sees the month being the same, it as the total amounts and give the same amount on those two rows, then it comes to the month of July. It add all the July and the previous difference between rose and range that the order by clause does not impact the row calculation, and it basically works. It's aggregation on each row as it goes through. On the other hand range, basically logically groups based on the order by and calculates the total sum based on the order by column. Let's look at the roads calculation through a visual aid. So the moment we used the Rose keyword, the framing happens at each row. So as Rose are being added, the frame gets bigger as you can see from the visual on the right side that the frame one through frame six is an increment from the first row to the last row. So as the frame goes bigger, the total sum or the running, some gets added as the control goes down to the last row off Mary's transaction and likewise with Barbara. Now let's say if you want to calculate a moving sound, meaning that a sliding window that goes through our set of records will keep the rest of the script the same will still use the some off the amount over Klaus partitioned by customer I D. And the order of the month remains the same. So this information we're not changing, we gonna add Rose between and give a boundary a one preceding and current crow. Likewise, we can also add the range between one proceeding and country and see how our calculations are different between rose and range. Let's run the script if you look at our rows Columns, um, moving total payment rose bait amount one day, so we have ah, one day prior sliding window. So now we have $5.99. So we don't have any prior RO just gonna represent fighters nine cents as you go to the next column. It's gonna add one previous amount and this amount and put $15.98 as it moved forward is just gonna only add one previous amount because you look at the last roll. The reason why we have $13.98 is because it it added $5.99 to $7.99. So it's a sliding window with one month prior by using the row between one preceding and current row. If you look at the range calculation, it again summed up the two amounts based on the payment month. So for the month of June, we have $15.98 as it went through the next set off month. It added again for the month of July, so it added July payments to the June payments, but it repeats it for the same month. If you look at the last Ruoff Mary and look at daughter moving payment range came in one day, it's $25 in 98 96 cents. The reason it is $25.96 is because it added $5.99 to these three transactions in July. So knowing the difference between rose and range calculation becomes really helpful. As you start doing your analysis, let's look at the sliding effect using the rose clause through a visual aid. So this late is showing you are moving total payment for each customer by month and were able to achieve that by setting up a boundary and by saying Rose between one preceding and current. And if you look at the framing, it's basically has two rows as it moves through, their data sat in school a sliding window. If he would have used between two preceding and current, the window would have had at the most three records as it's moving through the table 58. Window Function Components: Let's review the components off the window function. You can apply your regular aggregate functions through the window in logic like use your average men, max some and rank functions and many more. The expression list is where we have the column indicated on which the function runs. Window functions are initiated with the over Klaus, partitioned by list specifies dividing the rose into groups. You can also control the order in which the rows are processed by the window functions, using the order by clause. Hey, sending or descending rose and range frame clause allows us to operate on a subset off the partitions by breaking the partition into even smaller sequences of rose. The frame clause lets you limit the number of rows within a partition by specifying a start . An endpoint. Let's try to expand the row and range capability off the winning function. The additional framing clause uses either a role or a range and then sets up a boundary using the between clause. These expressions can fall into different categories of being either unbounded proceeding, unbounded following and few others. If you look at the diagram, the unbounded proceeding and the unbounded following represents the first row off the partition and then the last roll off the partition so that kind of sets the boundary. You can also set a custom boundary by giving an offset for number, proceeding and number following from the cut intro. The default frame is range between unbounded proceeding and control. Range. Clause treats the duplicate values as single entity. Worse, ISRO clause uses them as separate roll frames are based on physical offsets and range frames, arm or logical in nature, we will look at more examples, which will make the difference between row and range frames much more clearer. 59. Window Function Introduction: video functions are one of the coolest features I think sequel has. Once you know the mechanics are working with window functions, you can use it for different use cases and calculate the desired metrics. The window ing process less. You divide the results that into groups off rose called partitions and then frames let you operate on even a subset off partitioned data by breaking partitions into smaller sequences of rose. There a lot of functions that can be applied. We have the winnowing concept, some of the use cases that can use the window function. If you want to calculate the top 10 products this year by total sales. If you wanna calculate moving averages or running totals, we will go through some scenarios where you can see how window functions can be used. You can apply the window function on a particular column while still maintaining the detail off the whole table, like if you remember in our group by functions. We used to get data that user condensed the number of records based on the group by column , but with window functions, you can still maintain the detailed set of records off the table and still apply grouping on the data based on different aggregate functions. So the window in concept becomes really important when you're trying to see the data in its totality. Next, we'll see how the window function is constructed and what are the components that make of indo function work? 60. Window Ranking Function: ranking is another category of rendering functions that gets used a lot. Rank functions can be used to calculate the top number of records in a data set rank functions assigned the same ranks to equal rose versus danced rank that has no gaps. Hence the term dense and tile slices. The data into the given number of subdivisions you give andro number just gives a physical number two each row. In your petition, let's see these functions in practice. I really like rank function. So we're gonna be using the employee table for running our rank examples. I want to update one off the Sally Records for employees five. So we can see the difference between the rank and dance rank. Please run that update statement. Great for script one. We want to calculate the rank and danced rank on the salary off the employees. So we're gonna use the keyword drank open and close bracket over partitioned by department name. So for every department, I want to rank the salary. Same thing I want to do with dense rank. Dense underscore rank open and close bracket over partition, my department and order by salary. So I want the ranking by salary within a department. So let's run this. So if you C H R gets 123 rank and dance rank 123 So far, rank and dance rank is giving us the same pranks. Let's go to production department if you look the first row or production gets both ranks as one second and third gets, too, because the salary is the same. But the fourth record would rank function gets four, and dense strength gets three. So here you can see there's a gap in the rank function. It's skipping three. But the dance rank does not skip, and there no gaps when using the dense rank. And so it systematically say's 12 to 3 and four. On the other hand, ranks say's 1224 and five. So I prefer using dense ranks. For this reason, let's see if you want to calculate the top five employees by salary, what we can do is we can apply the dense rank function, used the over claws in just order by salary descending, so we want the highest salary first and lowest at the bottom, and we're not partitioning by department right now. because we just want the top five salaries regardless of the department. So we're using an in line sequel right here. So if you look at the brackets, this particular select, it's giving your dance. Drank for the high salaries one and then kind of goes down all the way down to eight. But we're looking for the top five employees, so we need to filter this data further. So to filter the sequel inside, we will use another sequel outside this and keep this equal in close brackets given an alias. So the whole script inside will get an alias off acts So we'll say, Select star from the inner sequel. This acts where salary underscored dense rank ISS less than unequal to five, which is filter these records where salary dense rank is less unequal to fight. So if we run that we should get the top five employees by salary. There you go. So it's a handy view of getting an end number off people within a group based on a particular metric. Let's look it script to we want to basically use the entire function to split the payment data off the customers into four groups How are we going to do that? So we're gonna use an end tile function Open bracket four. Basically, this gives you how many slices you need off the data over which initiates the window function order by the amount descending as quartile. And we're gonna join the payment and the customer table on the customer I d. So, basically, we want to see the amounts paid by the customer and then group them into four distinctive groups. Let's run this. If you look at the quartile College assigning these roads Group one, if you scroll down to records, we have some records assigned as to sometimes three and then four. So this is an easy way to categorise the number of records in your table toe a particular given group using the n tal function and giving a particular number of groups you need in this parentheses right here in script three, we're gonna be calculating the top earning employees per department. So for every department I want the highest paid employee, we're gonna use the role underscore number function, partitioned the data by department name and then order the salary descending. We roll numbers basically assigned just a number to each physical row. If we run the script highlighted on the screen, we gonna get a sequential number assigned to each row in our employees table. So for HR department, we're getting 123 and then the ordering is based on salary descending the way we can filter this record and only bring records where the employee ranks equal to one. We need to have an outer sequel, Select Star from Open Brackets and Close Bracket, where we have the in Line sequel given an alias, acts where employees rank equals want. So let's run this. So this gives us the highest paid employees in each department based on the salary. Hopefully, these examples help you understand the ranking functions. 61. Window Positional Function: positional window function helps an advocate data location in the results. That, and helpful in business reporting, lag and lead functions let you either access the data before the control or after the control, depending on the offset you give. First value and last value gives you either the first or the last value in an order to sat . Let's look at these functions in practice. So what script one is doing It is calculating the first and the last value off the employee I d. Partitioned by department name. So it's a pretty simple script. Give you the first employee and the last employees. So if you see we're getting frame first row, we're getting the first employee here and cream last road. We're getting the last employee here, which is Grace Lee. Pretty simple, straightforward function. Let's try to run script to, which is calculating the lag and the lead function or the employee i d. Partitioned by department name and ordered by ideas were ordering it by I D. So let's run that If you look at the HR department, we have three employees. So for the first employee, there's no previous row. If you don't give an offset. What a default value for lag and lead. So it's gonna basically set the offset by one record. So lag would be one record before, and lead would be one record after. So the next row, which is using the lead function, is going to be two and three from the current trolls. It just gives you that for the third employees in this particular partition. There's no next record and it's gonna be no. But if you use the offset for the lead function, like in this case I've used to that means that the function would fetch record from two rows down. So when you're on the first row on employees, one to record Stone would be Grace Lee. So we're getting three here for the lead with setting it off set off to, but they're no records after two records from two. That's why we're getting no, no, we're using the lead and lag offset. You can really navigate the data that you have in the table within the partition. So for every new partition, the window function would start again 62. Data Visualization Introduction: one of the reasons why I added the data visualization section to the scores is that as a data analyst, you would be asked to present your data findings in a very concise and impactful way so the business can understand the value of your analysis along with the key metrics that impact their business. Think of data visualization as a tool for communication. One of the best ways to explore and try to understand large data sets is through data visualization. The concept of storytelling is how we set the stage to showcase our data so it can be understood without going into too much technical details and with data volumes going up, businesses leaning more more towards leveraging the civilization to better educate them on either discovering patterns, spotting trends and highlighting information where there are opportunities to improve. The goal off all data centric organizations is to make better data driven decisions which would lead to better insight and that would ultimately drive business. Aesthetics of data. Visualisation becomes very important as we start using visualization to communicate the message. S statics described every aspect off a given graphical element, So like position, which describes where the element is located, so location becomes important as you showcase, your visualization shape can be used to show different data. Sets size can show magnitude. Color can sure intensity lines with varying with and type control patterns. So all these small intricacies can help your data visualization. B'more impactful, and we'll see a lot of examples in the other slides. 63. Data Visualization Data Types: once we get our hands on a data set, the first step is to understand the type of data values we want to visualize. The two categories in which data values can be split is categorical, and the miracle categorical is more qualitative, and his further split into either nominal or orginal. Nominal categories don't have a particular order and is used for more labeling purposes. Examples are either eye color or gender. So, for instance, gender might have values off em as meal and then f for female order. No values arm or ordered orginal data, as the name suggest, has a certain order to it. So customer survey might have data values as one being satisfied to being neutral and three being unsatisfied. Numerical category is MAWR quantitative. It could be further split into either discreet or continues. Quantitative values can be measured. Discrete data said, for instance, can take only a certain value. For example, the number of people in a room can either hold six or seven people, but not 5.5. So the examples are the number of sides you have for dice and number of pages in a book continues data set is where the data can be split into further slices and it's more like Infant Night continues. Data values are for which you can divide them into finer and finer increments. So, for instance, a temperature or a weighing scale, you can further subdivide the weight into smaller units, so it's more or less in finite. It's continues. The distinction of data into these groups will become really, really important as we start to use different types of charts and graphs, which will see in the later slights. Hopefully, this wizard will help you understand how we need to think of data before we start putting the data into a more visual form. 64. Data Visualization Categories: you might be faced with a particular situation where you need to present your data based on a specific requirement that can either be categorized as comparing the data values or showing composition off your data set or presenting how data is distributed or it is related. For each of these categories, we have a set of charts and graphs that can be used to deliver the desired results. I'm sure in a condensed list of chart types within these categories on the slide, we will go through each of these chart types in detail. In the later slides, I have attached a chart selection diagram to the lecture notes created by a doctor, Andrew Abella, that it's pretty elaborate and will help you pick the right chart for your data type. An important question you can ask before selecting a visualization is how many variables do you want to show in a single chart is the 123 or more and how many data points you want to display for each variable and will you display values or Peter of time? But among items or groups, answer to these questions will help you pick the right chart for your visualization. 65. Tableau Software and Datasets: We will use the tableau software for our data visualization course there many other data visualization tools in the market, but tableau by far is one of the leaders. If you follow the link given in the slide, it will take you to a public version off tableau, which is free of cost. You can download it either for Windows or Mac version. The only downside is that you cannot save your visualization on the desk. You can only save it online on tap those public portal. So this is only good for testing or university base assignments. You don't want to share sensitive data on the public website. We're gonna first install tableau on the Mac OS. So just given an email address, we have successfully installed tableau on Michaela's Let's install tableau public version on Windows. Great. Our software has been installed. We have two data sets we're gonna use during the course. One would be the world indicator, Data said. The other one is going to be the sample superstore data set. You have no dimensions or measures, and pretty much you have a blank canvas at this point. So what you want to do is you want to click on this, connect to data to get those two data sets loaded. So we'll go to excel because we have those data, sets and excel. We need to link to these two data sets. So we'll go first to the World Bank CEO to Excel sheet and load it once you're connected to the Excel Sheet is going to show you the tabs that you have in your Excel worksheet going to click on the CEO to data clean and double click that great. So we have actually successfully connected to the world CEO to Excel Sheet. You have your country court country named Region Year, your CO two emission and your per capita information. So now if you could click on the sheet one, we should have the dimensions in majors loaded. Great. So we have those, too. Now we need to connect to the second data, sets a click on the data source here, click on this Aiken, we're gonna connect to Excel Sample superstore Excel. So be able to look at these taps that are in that excel sheet so we don't double click on the orders. But this is the data set that we gonna be connecting to is the order data. Let's go to sheet one. Now you see, we have the C 02 data connected, and also the superstore data connected as well. So you'll see dimensions and mayors for the superstore data set. And if you click on the world C 02 days that we're seeing the dimensions and measures related to that data set as an exercise, you can connect to the world indicators data set on your own. Great. So we have the data now. 66. BAR Chart Visualization: bar charts Air, one of the most commonly used visualisation that shows numeric values as bars split across clear categories. They're effective in comparing magnitudes and discovering high and low in the data, and that is shown by the height of the bars. Bar chart is sometimes called column charts. You can either show data on a war tickle or a horizontal bar, and let's see if you have more categories or labels at times becomes harder to show that vertically, and we have the option to show them horizontally. X axis showed the dimensions and why axes show the measures. Bar charts. Discrete data is categorical in nature and therefore answers the question. How many? Within each of these categories? Let's take a look at the anatomy of the bar chart. If you look at the diagram, the horizontal axis has the categories or dimensions, and the word ical axis has the measures. The height of the bar is showing the magnitude, and the width of the bar is the same across, and you read from left to right. Each bar shows a particular category. This visualization is very easy to understand. This visualization becomes very impactful when you can see the radiations. In the height of the bar, we have another variation off the traditional bar chart that's called the Stack Bar Chart. Here you see the same bar chart, and the only difference is that each bar is further segmented out into sub categories. This becomes effective in you have, ah hierarchy in the category. So let's say you have a region, and then the region is split further into 2 to 3 different subregions. So this is an effective way to show those sub regions within a bar. A stack bar chart show finer details on the data set as the data can be seen. Split in further categories. Let's go to tableau and create these charts. For this exercise, we will connect to a world indicator data set click on Microsoft Excel, and you need to open the world indicator data set. So it's connecting our power. So what Taepo does is that it lets you see the data once you connect to it in a browser format so you can see what data you have. You can also improve the quality of data if you want to change a particular value once you see it. But for us, we're going to use the data as is Let's go to sheet one. So now you can see that the dimensions and mayors have been filled up piece of the data set . So our first task is to show how the inbound dollar amount is increasing per year in our data set, so double click on the year. So we have data from 2000 to 2012. Double click tourism inbound, which is a measure go to show me and will select the horizontal bars so you can see that the inbound tourism has been increasing worldwide. You can also switch the chart from horizontal to vertical by clicking on this Aiken. Great. So we have our first chart. If you look at this plus sign, we can create a new sheet, so we'll just to hand so we'll do. The same thing will double click on the year we'll double click on the tourism inbound that would click on the region. Now in the show me we need to click on the stock bar chart. Great. So now you can see we're able to create a stack bar chart where each bar shows total inbound dollar amount spent for each segment. And in this case, our segment is the region you can see Europe is leading in each of these years in Tourism in Africa is increasing over the years, but compared to Europe is still for less great. So this is a stack bar chart will create 1/3 chart. You can create a duplicate version, often existing sheet by clicking on the duplicate option in the menu, and this will be a site by side bar chart. I really like this visualization. What it's showing is that each region is shown in a column, and you have the number of years for the data for inbound dollar amounts showing. What we can do is instead of the year being on the color palette, we can replace that by region, so each region has its own color. Let's try to add a dashboard. The dashboard would be this Aiken right hair. You can combine multiple work sheet in a single dashboard. We contract the sheet on the pan and you'll see how the dashboard actually puts the sheet side by side. This really becomes a very powerful communication tool in your presentations you will make a few modifications to the dashboard. First, we will rename the worksheets to have a more descriptive name that goes along with the chart it's representing Second on our sheet one. We will change the color off the bar chart to have a more neutral tone. Third, we will adjust the size off the dashboard appropriately. Let's go make those updates or click on the color. I will just pick a gray. Great. I have renamed the three sheets with the desired names. You can do that on your side as well, by right clicking on the sheet. You want to change the name for and select rename from the menu selection so you have your layout and your dashboard. Then you have the size. It's an automatic size, but you can increase so you can increase the wit or decrease the width. You can decrease the height if you want, so this kind of gives you an easy way to minimize or expand the size. Now you can see these three visualizations in a dashboard, which really gives you a compelling story. My favorite is the total in monitors and per region details, so it's basically gives you each region and the way inbound tourism dollar amount is increasing. If you can see Asia has have the sharpest increase or the years and also Europe is leading by far. The total inbound tourism just gives us a total picture off how every year the dollar amount on the tourism is increasing and the stack march are kind of gives you how eat segment within the year for the region is compared. So this kind of gives you an overall dashboard that can used to show different perspectives off how the inbound tourism in the world can be compared among different regions. Hopefully, this was helpful. 67. Line Charts: line charts are like time. Siri's graph there used to visualize data trends over an interval of time. The data points on the chart are connected through a straight line, sharing numeric values plotted as lines or date related fields. The difference between the bar chart on the line chart is that the line chart needs to include Time dimension in it. We will also be using stacked area chart, which can also show time series data. Its main purpose is to show individual parts and how they make up the whole or the total. Let's see these charts in action. Using Top low, we will be connecting to the world indicator data sets a click on Microsoft Excel and open the data set. Great, we got connected. Let's go to sheet one. All right, so our dimension and measures have been loaded. So what we need to calculate is the total CEO to a mission or the Peter of Time and show which region is showing a spike in the year to mission and where we have a dip. Let's have the Dimension Year the book Click on the Dimension at the Measure CO two emission double click on that as well, and then double click on the region as well. So if you see we have a tabulated form of data here is really hard to see a dip or a peak on the sea or two based on the region. So we really need to get a visualization on this data to make some sense quickly. So you click on the Show me drop down and pick the line chart. So this line chart really becomes an impactful communication tool where you can see Isha by far from 2000 to has a peak, meaning the C 02 emissions have been going up quite significantly from 2000 to and own words for the region of Asia. Compared to the other regions, let's create a new worksheet for are stacked area chart. Click on the plus sign will pick the same dimension and majors double click on year double click on See or to Mission and then double click on the region. Click on the show me drop down and pick the area chart. Great. So the area chart is showing the total summed up see or to mission in the world, and it's doing it by stacking each region on top of each other. This chart becomes helpful to see the bigger picture. Now let's try to put these two charts on our dashboard by clicking the plus Aiken. We can just drag and drop these two sheets on the pen. Great. You can change the size off the dashboard based on your own need. Well, just expand it so it's fills up the screen. Let's try to rename these two worksheets. Do the same for the next sheet. Had a title. Great. Now we have our dashboard ready to be presented. 68. Histogram: history. Graham shows the frequency of distribution off your continues data. It divides the data set into clear groups called bins. The frequency off the Ben data becomes the measure off the history Graham and gets calculated automatically history. Graham uses one dimension that is in America, which gets split into Ben's history. Grams can also be useful to visualize the outliers in your data. Set in our past visualization with bar charts, the columns were showing categorical variable in history. Graham, the columns show quantitative variable. If you look at the anatomy off the history Graham on the X axes, you have fixed size bins where your numerical data gets grouped. There no gaps between the bars. As Baylies continues on the Y axis, we have frequency off the been data, and the height of the bar is showing how maney number of records you have per bid. So it shows the magnitude we will be creating to history Grams one for sales data and the other for profit from our superstore data set. Let's see these in practice. Let's connect to the superstore data set. Bring the orders to the pan. Great. So we have the data. Let's go to sheet one. We have to look at thesis Ailes measure and find the frequency off the distribution for this data set. So what we need to do, we need to create bins off the sales data. We need to figure out what size of the bins do we need for this exercise? I'm gonna pick the bin size as 10. Click on the sales measure, click on the drop down. There's a small icon on the right side, click on the create, and then click on Vince. There's a default. Been size. You can go ahead or write it the moment you created bin for the sales data. The Bend data goes into the dimension. We need to change the type of this data to be continues. You can click on the drop down and click and work to continues. Double click on the sales. Been great. So you got that on the column On the real we need to actually count the number of records we have for the sales data. So we'll just write count bracket Number of records closed bracket just hit. Enter. Great. So we got our history. Graham, what we need to do is we need to filter out some of the sales data. You can do this by dragging the sales to the filter pan, and you'll get this window. Click all values. You can reduce the maximum value to about 600. You can either type in the window or just drag this lighter hit. Apply. Great. So now we can see our history, Graham. And as you can see, that majority of the concentration off the values is between 0 to 50. If you look at the tallest Ben, that has the maximum number of records in it, so what we can do is we can label it. Just do a right click mark label all results so it shows you the number of records we have in that been great. Let's create a new worksheet. Now we'll create the history. Graham, with the prophet measure well again, pick the bin size for profit. I will pick 100 as a been size. Now you need to change their Ben into a continuous very well double click on it. So it shows on the column and on the road you need to do account number of record. Great. So we got our prophet history. Graham. What we need to do is we need to filter the profit between negative 1002 positive 1000. You can do this by moving the prophet to the filter pan and you'll get a window open. Click all values now. You can give a range between negative 1000 to positive 1000 you can type it in the small window it apply. Now we can see the majority of the transactions are between 0 $200 and you can label the bar with the highest number of records. As you move to the right and the left off the highest bar, you'll see that the number of records in those bins go down quite a bit. Let's try to add these to history grams into our dashboard by clicking the plus sign. You can just double, click or just drop those two sheets into the pan. You can rename both the worksheets to a more appropriate label, and whenever you are showing a history, Graham always try to mention the bin size on your diagram. We can rename sheet you as well by right clicking and added title. You can adjust the size, cough your dashboard as well. Great. There you have your dashboard, which are selected history. Grams 69. Scatter Plot: scatter plots show relationship between two measures one plotted on the X axis and the other on the Y Axis Scatter Plot scan showed a large number of data points and is suited to drive actionable insight. Unlike bar graphs, which uses at length as a visual clue, scatter plaice Youth position X and Y coordinate. Scatter plots can help show correlation between variables positive and negative and can depict clusters within your data points. If you look at the anatomy of the scatter, plot is basically made up often X and Y coordinate, and both axes have measures listed on them. This type of realization helps you capture a lot of data points in a very efficient manner . In our lab will be creating a scatter plot between profit and sales data off the superstore data set. Also identifying clusters within this data set by color coding it It will be fun. This light shows two plus one, showing a positive correlation between birthdate and infant mortality, meaning if one waited, what goes up the other goes as well, hence positive. The other shows negative correlation between life expectancy of females to birth rate. As one goes up, the other comes down, hence negative. The correlation Scatter plots will be added at a later time, and you can do this on your own and see if you get the desired results. For now, we'll go to the lab and create a scatter plot for our superstore data set. Let's connect to the data set superstore. Click on the orders Time click on the new worksheet. All right, so we have to get to measures, profit and sales on the pan. Double click on those two. So now you see that we have an aggregated point where all the profit and sales have been aggregated. We need to uncheck this aggregation by going to analysis and un checking the aggregate majors selection. Great. So now we have a spread off all our profit and sales data in a scatter plot. As you can see, we're able to show ah, lot of data points. Our end goal is to find the clusters off customers we have in our data set. We contract the customer name through the details marks card. Now what we need to do is go to the analysis time and you can see in the model you have the cluster option. Distract that and drop it on the pan. Great, you can see tableau based on its came ing algorithm, has clustered our data set into three distinctive clusters. Cluster 12 and three, and they are color coded for distinction. Our cluster one. As you can see, we have a lot of customers there, but the profit is minimal and cluster to you can see we have customers where we're making a lot of profit much better than cluster one. So that's definitely one cluster you want to focus on for your sales cluster threes where we have very few customers. But on each customer we're making a much larger profit margin. We can also move the cluster from the mark's car to the dimension, and what we'll do is it will create a customer group that could be used later on another regionalisation. Let's create a new worksheet. Let's bring in latitude and longitude and the postal code for the customers. This will show us where all our customers in in North America now what we need to do is we need to track the customer group that we made from the clustering into the filter and select which cluster we want to see on our map. Now you can see we can select clusters and the map is gonna get updated based on the selection off the cluster we pick. So you want a target marketing based on a certain cluster, we can see where they are. You contract the customer group to the colors. Marks card. Looking at these clusters on a map really gives a powerful visualization for us to do our targeted marketing and seeing which cluster has a better impact and which location needs to be focused. More. Let's add our scatter plot to the dashboard. Let's rename it. 70. PIE Chart: people just either love pie charts to just hate them. I'm not to phone of them. Pie charts are good to use when you have few categories in your data set. They help ensuring how one category is related to the other in proportion. I would say 3 to 6 categories would be appropriate. Pie charts can either show nominal or orginal data. If you look at the anatomy of the pie chart, each category has a certain percentage that adds up 200%. You can either call each portion or category of the pie as slices or veg is. If you showing more than six categories than the pie chart will become more crowded. It'll be harder to convey the message, so please keep the categories under six. We will be creating a pie chart off the profits from our superstore data set per region. This visualization is showing US sales data per category off the products within each state of USA on a map. You can try to create this visualization on your own and see if you get their desired results. Let's try to create a pie chart in top low. Let's connect to our superstore data set. Select orders data. Click on sheet one. Double click profit from measures and Double Click Region from Dimension. Go to Show Me and click on the pie chart. Let's try to fit this on the screen. Great. So now what we need to do is we need to label each portion or veg bring the region and profit to the label marks pan. Great. We have our labelling done. I was trying to label the sheet as well. Let's increase the size of the phone and make it bold. We'll do the same for the region and the profit. Format it and increase the want. Emboldened as well. Let's add it to the dashboard. Let's adjust the size. We will remove the total profit. Great. We have our pie chart. Regionalization done.