Business and Data Analysis with SQL | Geoff Devitt | Skillshare

Business and Data Analysis with SQL

Geoff Devitt

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
38 Lessons (5h 4m)
    • 1. Business and Data Analysis with SQL

      6:06
    • 2. Introduction to Databases: A History

      3:38
    • 3. SQL Statement Categories

      2:25
    • 4. Capture Data Requirements

      9:08
    • 5. Physical vs logical data representations

      3:08
    • 6. Normalise Data

      7:08
    • 7. Fact Tables

      3:48
    • 8. Aggregrate data

      6:45
    • 9. Entity Relationship Diagrams (ERD)

      7:29
    • 10. Extract Transform and Load Data (ETL)

      12:43
    • 11. MySQL Server Setup

      4:36
    • 12. MySQL Server Start and Stop Service

      2:06
    • 13. Connect to database

      6:45
    • 14. Comments in SQL

      5:20
    • 15. Selecting data from a table

      16:19
    • 16. Table indexes

      8:01
    • 17. Table and column aliases

      5:32
    • 18. Inner Join Tables

      1:58
    • 19. Left and Right Outer Join Tables

      4:10
    • 20. Union (all) Results

      3:45
    • 21. Project Management (Agile Methodology)

      11:13
    • 22. Kanban boards

      9:02
    • 23. Prioritise Requirements using MoSCoW

      4:04
    • 24. Sales Project - Introduction

      5:10
    • 25. Create database

      5:07
    • 26. Query Workflow

      8:33
    • 27. Sales Project - Part 1 - Data Capture

      15:18
    • 28. Sales Project - Part 2 - Data Analysis

      8:20
    • 29. Sales Project - Part 3 - Data Summary

      12:16
    • 30. Sales Project - Part 4 - Send to production

      9:07
    • 31. MySQL Query Explain Plan

      7:08
    • 32. Common Table Expressions (WITH)

      14:06
    • 33. User Administration

      13:03
    • 34. MySQL Set Environment Variable

      5:24
    • 35. MySQL Command Line

      17:05
    • 36. Export results to Excel

      6:51
    • 37. Setup MySQL ODBC Connector

      10:47
    • 38. Create a report in Micorsoft Excel

      20:48
13 students are watching this class

About This Class

Have you thought about working with big data but don't know where to start?  In this course, you will learn the skills of what it takes to become a data and business analyst.  

What we will cover:
1) Database and datawarehouse design
2) How to write queries in SQL (Structured Query Language)
3) How to build interactive dashboards and reports
4) Tips and techniques on how to run your own big data project from start to finish

My name is Geoffrey Devitt and I have been working with Big Data and Databases for the past 20+ years.  In this course, I will share with you an overview of the skills I have learnt over the years of working on numerous data driven projects. As a data and business analyst, I have had the privilege of working in many different industry sectors, including financial banking, online gaming, eCommerce retail, insurance and telecoms.

With the knowledge you will gain from this tailored course, you will be able to fit into any industry and be in a position to start adding value to your team from day one.

Transcripts

1. Business and Data Analysis with SQL: Hey, everyone, I'm welcome to business and data analysis with SQL. My name is Jeff devilish on. For the last 20 years, I've been analyzing data of all shapes and sizes. I felt it was a good time that I shared some of my knowledge and experiences of being a data and business analyst with you here today. In this course, it's no secret that data has become a key component for driving many businesses as it touches just about every aspect of our working on personal lives on a daily basis, be it from your smartphone that contains numerous instruments for capturing what you were doing, such as your GPS enabled camera to posting on social media to even the adverts we see when shopping online. When I first began developing reports and dashboards some 20 years ago, a standard report might have taken me at least a month or two. To develop. This would entail many review meetings with the business on development was cumbersome due to the lack of tools on processing power of computers at the time. Fast forward to today on the same projects can be achieved in days, if not hours, and even in some cases, I've been able to produce reports during an initial kickoff meeting with the business. Basically, as computers have increased in power to the point where we can no process data in real time , we see new fields such as data science emerging that present new challenges for businesses such as natural language processing, used in devices like Alexa and Cortana, which can understand our questions. We ask them to image processing and neural networks used in the likes of driverless cars and shopping. Recommend er systems that you find on platforms such as Amazon and Facebook. Reporting tools have pretty much become commonplace. And most organizations I'm not just limited to data analysis. In fact, the last number of years I've worked as a business intelligence manager. My role has been to help teams set up self service data platforms that allow business users to pick and choose what data they want to view. In a report as a date of business analyst, your responsibility is not just to build reports, but to also tell a story that provides insight for a user. In other words, you are really acting as a consultant to the data warehouse to identify four. BUSINESS Just what data is readily available for them to analyze on to make connections between new data sources that are coming into a data warehouse with existing data sources. Data really is key to driving any business. And in today's markets, having an insightful report can be the difference of make or break for business. So let's get down to business. What will you learn from this course? I structured the course into three parts. First, we will get to grips with some of the basic contents of database and data warehouse designed. Knowing how your date is sourced on stored plays a large part of any day to project. And typically with large data projects, you may find yourself working with multiple source systems such as customer demographic data, financial transactions, fraud systems, customer support, marketing systems, Internet of things and so on. Having a good understanding of how all this data is captured, processed and then stored will make your projects run a lot smoother in time. As you build up an intimate knowledge of the systems you and your business are utilizing, you are likely to become the go to analysts and subject matter expert for many teams. Next we will learn how to write SQL Code, which is the standard programming language used to query data from a database. Our data warehouse. Don't worry if you don't have a program in background, as I will start from the basics and work through many examples to get you up to speed with writing your own SQL queries. What I can say about SQL code is in all the years that I've been working with databases, SQL code has changed very little and typically only requires that you know a handful of statements. I will walk you through the process I use on a daily basis for coding and debugging Cleary's Once you've got to grips with the fundamentals of SQL, this will set you up to work with just write any relational database, our data warehouse in the market today. Once we have built our queries and captured our data, I will then show you how you can create a report and automate your report to be delivered to your team on a recurring scheduled basis. And finally, I will also provide you with some of the soft skills I've developed to help you its project management and requirement solicitation to ensure that you manage on, deliver your projects on time. I hope you enjoy this course as much as I've enjoyed writing it on, please. If you have any questions, be sure to post your comments on remarks and feedback area, and I will get back to you as quickly as I can. So let's get started by first getting you set up with your own database development environment, which we will be referring to when we start writing our SQL code. That's all for this section on. I'll see you in the next section. 2. Introduction to Databases: A History: Before we start any database program, we first need to understand some terminology associated with databases. In this section, I will give a brief overview of the history of databases on why they have become so popular for businesses over the last number of decades. In 1970 a young Edgar called working at IBM in San Jose wrote a research paper entitled A Relational Model of Data for Large Share data Banks. This became the foundation for what We 90 is the relational model. IBM over the next few years worked on implementing the relational model and came up with the relational database Dr Card. Also, I had lined in his paper that such relational models should also have a standard programming language to allow for insertion, deletion and updating of data as well as user administration. Before relational databases came along, data was typically stored in a number of different methods, such as hierarchical data structures and flat file systems, which were cumbersome to update on, often led to many discrepancies in data due to the nature of having duplicate data across systems. Dr Carl noted that by using a centralized relational model, this would address a large number of the issues. By normalizing data and having the table to relate to each other, anomalies could be greatly reduced, where only a single row of data needed to be updated in any given table and within each table, a unique he would be used to identify the given. Rove data, in addition to using a relational model, also could stain on the amount of data that needs to be recorded often referred to US data . Redundancy Doctor called along with Raymond Voice went on to develop the boys called Normal Form, also known as B C N F, which addresses some additional anomalies not addressed in his original paper. Using Cardinal will form Donald Chamberlain and Raymond Voice. Also working in IBN at the same time is Doctor called Narrative Cards Paper and started working on the development of a program language called Sequel s E Q U E L, which is an acronym for structured English query language. However, they later shortened the name to SQL structured query language. A sequel was already a trademark of the UK based Hawker Sidley Aircraft Company in 1979 Relational Software Company now Arkle Corporation released the first commercially available rd BMS Relational database management system and since then, the term RD BMS has become commonplace when discussing databases. Lastly, in 1986 the SQL language was officially recognized by the American National Standards Institute, ANSI and A year later. Results are recognized by the International Standards Organisation I S. O. Although SQL is recognized as a standard, there are slight variations on the language between database vendors, notably that SQL originally had no form of procedural programming. On has Bean added in by many database vendors since such as Oracle tarried Asia Post GREss my SQL DB two on Microsoft SQL Server. That's all for this section on a brief history of databases. 3. SQL Statement Categories: you may recall, from our introduction to databases in Dr Khan's paper entitled A Relational Model of Data for Large Share Data Banks. He suggested that in addition to creating a relational model for data, we should also be a standard programming language, which has the means to query data, defines data structures to store our data in and also grant permissions to users to access the data structures. Which, as you know by now, is SQL Code. SQL statements can be broken down into three categories. The first of these categories is referred to as data manipulation language. R D M L Statements. Such D email statements include select, insert, update, delete and merge data within a table. When you hear an analyst referring to writing queries in SQL, they're typically referring to such D. M L statements, in other words, statements that allows to manipulate data from a table to return a required result. The next category of SQL statements is known as data definition language, D D L statements, which are SQL statements that help us to define our data structures within a database. Our data warehouse, such as create alter our drunk a table view our index. The last set of SQL statements we will be learning in this course are referred to as data control. Language statements are D. C L. Statements. Such statements are used to control user permissions for accessing parts of the database, our data warehouse and underlying tables on views To summarize SQL statements can be categorized into three groups. Data manipulation. Language statements which allow us to query data from a database data definition language statements, which provide us with the means to create data structures to store data in such as tables and views and, finally, data control language statements, which allow us to define user rights and privileges for accessing a database and subsequent tables. That's all on SQL Statement categories. 4. Capture Data Requirements: in this section, we will look at the process of capturing data from a source system and how we can process that data to bring it into a database. On the left is a sample purchase order which captures information about the company. What data The recipient vendor as well as Thebe purchase order items. In this scenario, the purchase order has been made for 10 product licences and 10 product maintenance. Self work ease. Assuming we have access to the report are its corresponding back end Rod Asia, which might be stored in a number of formats, such as a flat file in the form of CS. Fea r dot txt East A spreadsheet are recorded directly to another database. For example, some reports, maybe read only to protect their contents on specific access would be required for development purposes. If this is the case, then I would always advise to seek access from the owner of the report or network administrator, providing clear project details and sign off from your project sponsor to carry out this work. Today, most networks will have safeguards in place that can identify if such read only data is trying to be written off a network without the consent of an owner on such actions of reading of file without permission could be considered as a data breach in your company. So always best aside with an air of caution and make sure that you have proper clearance. First, we need to identify which fields we want to bring into our database. When reviewing any report, it is always good to check if there is any documentation that was created by the original developer for the report you are reviewing. Most business reports will have a summary page within the report, which provides details of the report on the corresponding fields and any formulas used for calculated fields. Likewise, if your end goal is to produce a new report, our dashboard, it is good practice to embed such a report Summary in your own reports on high to use the report, our dashboard that you have created are you may provide a link with in your report that brings you to an external wiki page where details of the report can be found. If the original developer has provided a report summary and data dictionary identifying how the fields are captured, the data types and the relevant groupings. Having access to such a report will go a long way to speeding up the process of your own data capture into our new our existing database. However, if no such report is available on, we only have the contents of the day to report to go on, then we need to work through a process of identifying which fields are relevant to our project. The first step is to identify all relevant fields within the report on to start building our own data dictionary in the table on the right is listed the source fields identified with a yellow column header on the potential destination fields and table names noted with a green header. Thes field and table names correspond to the groupings of information we have identified in our purchase order report. We don't need to record calculated fields in this stage as thes could be re calculated at a later stage. However, it is important to note the's fields for your overall project requirements as it is very likely that they will be required if a new reporter dashboard is due for development. Such calculated fields could be complex in terms of their formulas. And if no formula is available for the calculated fields, then you may need to review with a current user are subject matter expert S N E. For the report to identify the formulas for the Calculated Fields. Our attempt to contact the original developer for support. In addition, we want to capture a sample of data and the corresponding data type on lent of the fields. If you do not have access to the data dictionary, then at best recorded general data type and at a later stage, you can choose a more appropriate data type for the field. For example, in the first row is the field product order date. As this is a date field. I've recorded the date format as month, day year in the column length. However, our new source state base records dates in the default system format, which is year month date. Knowing the date format will prevent any insertion errors into our new database, as we can specify a format when loading such date fields. If we need to change the date format at a later stage, we can then form at the date. In any newly generator reports. Most business intelligence tools will allow a user to specify their own preferred date format on when reading dates from the database. It can reform up the date fields according to the user's preference. For example, European dates air in the format date month, year While US state format is month, day year. Moving down the sample of data. There are two fields that have no data present ship fire on the acronym F O B. If there is no sample data than I would recommend that you refer to the original Developers Data Dictionary are trying to review with a subject matter expert for a sample of data. Lastly, we can also note if all are just some of the fields are required for our project. In the case of our example, I will assume that all fields are relevant in the majority of cases. Even if a handful of fields appear to be irrelevant for your current project, you may want to check if they have potential future use. And if so, it is better to capture the data now than to have to revisit the project at a later stage, which could prove to be a very costly exercise. Therefore, when indict, a simple rule of thumb is to say we will capture all information from the report. Once you have captured your data fields types and four months from the original report, you should also review the following considerations with regards to the overall data set you will be capturing. For example, is this data to be purchased from a turd party data vendor? And if so, how much are we being charged for each field? If you are working with 1/3 party data vendor, that is to provide you with additional customer merchant demographic details, and they can provide you with one ties and fields of related data. You may not actually require all one ties and fields and specifically just choose which items you want to bring into your database relevant to your project. For example, you may have been asked to identify if a merchant website is set up for e commerce transactions. And while the third party data vendor can provide you with a multitude of fields related to that merchant, you're only specifically interested in a dozen fields with regards to the merchants e commerce website, settle or maybe the source system. You're capturing data from IHS so large that you can't actually afford to bring in all the fields and replicate such a system. For example, the project owners asked you to capture data from another source system within your business. On a problem review, you've identified that the second resource system has very complex and extensive remains of data being captured. For example, this might be a sales, our customer support phone system. It may not be possible then to capture all the data from the secondary system, and you will need to decide once again what data is relevant to your project. Lastly, you may also want to identify the quantity and frequency of data that you will be bringing into your table. From this, you can calculate approximately how large your table will be in your new database table. And if you have constraining factors within your database, you then need to decide if you can afford to bring in. All the data are not. If it is three case scenario that you do not require all the data for a given frequency and your current database does not meet these demands, then you need to take into account, increasing your database size to cater for your new data model that you're presenting to summarize when importing are migrating data to a database. There are large number of factors to take into consideration. As a business data analyst, you should try and consult with as many users of the source system to identify what data is relevant for the business and also consult with the database administration team to trying to approximate the size of the data that you will be bringing into your database. That's all for this section on data migration considerations, and I'll see you in the next section. 5. Physical vs logical data representations: now that we have successfully connected to the secular database, I will briefly describe the difference between the terms, database and schema. The database could be considered as the physical structure where we store our source system data, while a schemer is often referred to as the logical business representation of the data. In addition to databases and schemers, we also have a similar concept for tables and views where a table will be a role representation of data from a source system on the view will be the logical business representation of that data. For example, in the secular database, we have a number of logical views that could be queried the same as if we were accessing a physical table. One such view is the actor Info View, which is a query that combines four tables together. The actor table, the film actor table, the film category table on the category table, which is combined into a single query view to provide us with movies that an actor has starred in, and having such a logical view saves a user from having to write their own queries to identify such views of information. This also ensures that correct business logic is applied where there are often cases that a single view of data, our business representation, may be interpreted differently across the business. Having such logical views means that of businesses, end users will be accessing the same consistent, logical data representation across the whole business. Typically, data analysts are business analysts will work directly alongside business subject matter experts S. Amis to build art such business models. So while the source system data could be stored in a number of physical tables in a database, our data warehouse, this is not always the best way to represent the data to the business. And hence we have logical views that are placed on top of the physical tables to ensure accurate business representations. As a business analyst, it will be your job to interpret the business requirements and produce such a final output . Final outputs such as logical views can then be used by other analysts to ensure that their reports and dashboards meet the gold standard of the business. A gold standard report could be considered as a logical representation of a business data model that is accepted across an entire business, which other analysts can then refer to when testing their own analysis. To summarize a database and table can be considered as a physical implementation of a source systems data, while schemers and views actors are logical business representations. That's all for now in this section, and I'll see you in the next section. 6. Normalise Data: Now that we've identified which fields we want to bring into our database, we can go through the steps to normalize our data. The first processes to make are required fields into a normalized form. U. N F. We list the attributes of the data set and then identify the primary. Key of primary key is a value that uniquely identifies. Any given record in this case are unique. Record is identified by the product order number R P O number. Next, we identify any repeating groups in the data set and where we have a repeating group we need to identify if the repeating group has a given key. This could be a single value, are a composite value, which is a combination of two or more values to identify the given group moving on to first normal form for the groups that we have identified. We now separate thes groups from the original entity and at the original key to these new repeating entities, you can see for the Entity Pio Company, where we had a primary key of the company name to relate this back to our original product order. We add the field P o number to the company entity and similarly, we do the same for P o Vendor Pio ship on Pio item by adding the P O number to each of the new entities. This ensures that we now have a relationship between the original product order and our new entity groups moving on to second normal form. First, we examined the tables for Compass AKI's I E aqui, made up of two or more parts for each known key attributes. We then determine if it's key is the first part. The second part are neither if neither then we use both parts. Lastly, we removed the partial key and its dependence to form a new table. If we never review our diagram on the left, we can see where we had a relationship for P o company and product order. We can I separate out company into its own table. Tim Company, which will act as a look up table for P O company by reference of the company name by creating dim company. This ensures that we only have a single source of truth on record for any given company. We do similar with dim vendor dim ship to and Tim item. Such look up reference tables are often referred to as dimensional tables, and hence common naming convention is to prefix such tables with the abbreviation dim for dimension. For third normal form. We need to identify any dependencies between non key attributes with each table. We then remove thes dependencies and creation you table. We then promote one of the attributes in the new table to be a key for the table on. Lastly, we can take the new key value on Link it back to our original table, which will be known as a foreign key in our original table. Reviewing our diagram, we can I see that a number of new relational tables have been created which identify relationships between the original product order and a dimensional table. For example, in the product order table, the field authorize ER has been replaced with the company employee I D. From our dim company employee table, we can do similar for dim vendor employees where the requester of the product order has now been replaced with the vendor employee I d. Once we have reached a turd normal form, we can now review what indexes have been created. If I find the index is made up of a character or string value, then I will typically replace this value with a numeric field. For example, Dim company. The primary key was the company name. I've now added a new numeric value called Company I D Company I D can be automatically generated in our database, are by our source program and would typically follow the form of a value, which is incremental by one. Therefore, any time we have a new company added to our table, the company I d will be set automatically, which will be a value corresponding to the maximum company I D. In the dim company table. Increment it by one using a numeric value that is automatically generated for us. Also insurers that we can't have an index violation where we may actually have to company names with the same name. This case is quite common where a global company may have multiple offices under the one company name. For example, suppose our company what data has a number of global offices. If we use the company name as an index, this will cause an index violation as an index is to provide a unique identifier for all records. Instead, by using a primary key that is automatically generated for us, the company name can remain the same, and we don't have an index violation. Each row is now uniquely identified by the company I d. In addition, we may want to. Well, it's a record. Which account is the master account? For what data? By adding a master company I d field. If the master company I d equals the company i d. This informed us that the master account all subsequent accounts can be placed under the master count by making reference to the company I d. In our example, we have a company I d equal to one, which we will set as theme Astor Company I. D and Company I DS for 23 and four are child accounts of the master company. I d equal to one. The structure will also alive for creating hierarchies of many levels provided a company I d is assigned to a master company. I d to summarize intern normal form. There should be no scenarios where we have two entities sharing the same table and any time where you see such a group of information, you can typically separate decision into a separate entity table and provide a foreign key . Look up back to the original table. This reduces the amount of data that needs to be stored in a table, hence removing data redundancy. While we have normalized our data to turn normal form, it is always good to check that our data indexes foreign key, a new dimensional tables all make sense to our project that concludes this section on data normalization. 7. Fact Tables: in this section, we will review what are factor tables and how they help speed up data analysis. In our diagram, we can see the product order data represented in three forms. First, the A normalized form U. N F, followed by the third normal form three NF and finally on the right, a fact table with corresponding dimensional tables. A fact table with regards to our initial product order transactions typically comprises only of the primary key and relevant foreign key information. In other words, that is somewhere between an a normalized table and it's turned normal form, dimensional tables while a normalized data can be cumbersome to review and even slowed query if the number of fields is very large. A fact table looks to address these problems by only storing the bare essentials, namely the index and foreign key relations. We identified by normalising the data. Let's now take a look to example queries using an a normalized table first, followed by a fact table to estimate the amount of space we potentially required to run our query. Later on. In the course, we will start using my SQL. We will review actual query plans and how to optimize them. Take, for example, if our original product order table had more than one ties and fields of information and was capturing more than a 1,000,000 transactions a day. And we wanted one year or 365 days worth of data calculating the size. We can see that to query such a table with a conservative estimate of 10 bits of data per column, we would require approximately 456 megabytes to hold our query results. In addition, we also need to take into account server load number of users on the database server, running queries and shared memory resources that best if we're running on a fast server. This may take anywhere between a few minutes toe a few hours to return our results, depending on the server load. Ideally, we want to trim down or queria as much as possible to just the bare essentials and by utilizing a fact table, which is already optimized for us. This should help reduce the um, age of storage that is required for a result set. Now let's review the equivalent factor table. This time, our table contains only the relevant primary key and foreign key relationships along with a date field. We estimate that our query will only require 4.5 megabytes of server space and even less if we are not concerned with using all the data from the fact table. So hopefully this illustrates for you the importance of using fact tables and good database design to summarize fact tables help identify very quickly for an analyst what fields of information are relevant to a transaction and should also indicate to us what dimensional tables are potentially available. Where foreign keys air traditionally named relative to their corresponding dimensional tables for example, a foreign key of company i d. We would expect to relate to a dimensional table of dim company. And if the dim prefix is not added to the table, then to expect at least a table called company to denote a fact table, the table will also typically have the prefix fact. And in our example, we have called our fact table for product order, fact product order that concludes this section on fact tables 8. Aggregrate data: in this section, we will review aggregate tables. The term to aggregate data is to provide a high level summary of the key information we are reviewing. If you're used to working with spreadsheets, this will be similar to creating a pivot table. When aggregating data, we will group our information by a number of attributes. First, attribute I typically consider for aggregation would be related to time. For example, if I'm reviewing customer transaction data, I may want to aggregate this data on the hourly level. If I was looking at a specific narrow band of time and wanted to identify key hours during the day when customers are visiting our website. Next will be daily level information to identify given days of the week when customers are most active, followed by more broad time ranges such as week, month, quarter and year. When reviewing any data is good to build a both a narrow, un broad view of your data, similar to if we were trading foreign exchange. Our stock options for foreign exchange markets as thes are very volatile. We would expect to be looking at five minutes to 30 minutes intervals as currency fluctuations can change dramatically to write the day versus more long term stock trading options, we would probably only consider weekly, monthly and yearly trends where we are not concerned too much by daily activity under trading on a more long term basis. So choosing your right time interval can be very much dependent on the business you are conducting and how often you need to react to your data. As a data analyst, you may only be required to look at events after they have happened, so daily weekly trends are probably as granular as you need to go for your review. However, if you were working in a live environment such as in a knock a network operations center as a systems analyst, then you may very well be displaying trends on a five minute and 30 minute interval checking network loads. Or you may be working for an online gaming company reviewing live tournament action and thus having to ensure that servers are stable on players are experiencing the best performance from your servers. After we've identified our time based aggregates, we then need to consider what other key attributes we want to group. Our data by likely candidates will be to start with country. If you're conducting business on a global our regional scale than having your data displayed by country will also mean that you can rank our order your key metrics accordingly. Next will be to group your information by your product, which will allow you to identify which products are performing the best, and similarly, which ones are not performing as expected. Or you may be running a campaign against a specific set of customers and want to check the performance of this campaign. So you have created a query to identify the performance of your campaign over the time for your campaign. Not only do you need to check for each country the performance of your campaign, but also by associated products within each country. In this scenario, we have had to create an aggregate with a combination of attributes for review. And while we can query each attribute to separately, such as by time country product, our overall campaign, it is also useful to look at our information using a combination of attributes such as we may want to identify which country had the best performance for a campaign, and within that country, what was the best overall performing product reviewing the top product for your top country . We can then try to identify, hide this product, performs and corresponding countries. And based on the outcome of this campaign, we may want to try and establish a further campaign that better targets are top product in our top country in other countries. And to ask the question, Why is our top product not performing as well in other countries as we would expect it to be once we have identified the key attributes we now need to look at? The key metrics are KP eyes key performance indicators below. I've given an example of three metrics, some count and maximum from our fact table example. You can see that I've provided a sample of customer transactions for the month of January on a daily basis for the UK, along with payment total on the corresponding temperature in degrees Celsius at the time. If you are an online retailer than having temperature data may be very important for your business. If your business is partial to seasonality and changes in weather, identifying such seasonality can help to better understand your overall business model. When inferring ah hypothesis in the second table I customer payment month. This table describes customer payment transaction data at a monthly level. Hence, we have a prefix for the table of AG short for aggregate and a suffix at the end of the table month to denote the level of aggregation in the table is at a monthly level. Similarly, we could have additional aggregate tables for daily, weekly, quarterly and yearly, with each aggregate table providing a summary of the corresponding key performance metrics for that period of aggregation. In our aggregate table example, we can see there is only one row of information at the monthly level, providing us with payment total, a distinct customer count and the maximum temperature for the given month. Later on, when we visit aggregate tables in our SQL coding, I will also identify some pitfalls to watch out for when aggregating data to summarize. Aggregate tables are useful for summarizing general business information over given periods of time for around list. These tables are invaluable as they can contain dramatically the time required to answer business questions without having to query low level fact table transaction information. In time, we will be looking at building our own aggregate tables to answer queries we have and going forward. If we're building at standard dashboards and reports, we're likely to reference our own created aggregate tables our standard aggregate tables as created by the database administration team. This concludes this section on aggregate tables. 9. Entity Relationship Diagrams (ERD): For the last few sections, we have looked at a number of different entity types, such as dimensional tables, fact tables and aggregate tables. In this section, we will discuss how entities are related to each other by means of what is known as an entity relationship Diagram E R diagrams also abbreviated to E. R. D. As discussed with normalization of tables, we look to separate repeating groups of data into their own separate dimensional tables. We demonstrated the use of primary keys and foreign keys between such tables, and now we will address how to depict a relationship between two such entities. While there are number of notations available for describing entities between tables, I will discuss here the crow's feet notation in the diagram below. You will note that the entity is donated as a rectangle where the relationship is described as a connecting line between two entities. There are four main types of relationships that could be described between two entities. For now, though, we will only show one entity and how to describe the various types of relationships we can have with an entity, for example, to denote a 0 to 1 relationship. Engy, we first draw a line to the entity and where the relationship line meets the entity. We draw a circle, followed by a vertical line to denote a 0 to 1 relationship. Sometimes you will also see the entity relationship written out in shorthand by means of a zero, followed by three dots, followed by a one. Next, we have a 1 to 1 relationship, which is represented by two parallel lines where the relationship line joins to the entity . Moving on. We have t zero to many relationship, which is denoted by a zero, followed by a three pronged crow's foot, which is where we get the name Crow's feet notation as described by the many relationship. And finally, we have a one to many relationship where we have a single vertical line followed by a crow's foot. Now let's look at a few actual examples taken from the entity relationship diagrams for the secular database, which we will be using as part of our SQL coding later on. In this course, the secular database is made up of three main entity groups, namely, the customer details, the payment details on the video details for illustration purposes. I will only focus on the customer detailed section, which contains Theo entities, customer address, city and country. These entities have be normalized to third normal form during design of a database. We typically talk about entities well. When we move into the actual database physical implementation, we start referring to entities as actual tables. Hence, you will often hear the term entity and table used interchangeably in conversation. To start with, we will look at the customer and address entity relationship. Note that the crow's feet notation is denoted on each end of the relationship between the two given entities, which means we can describe the relationship between two entities in the direction and entity is joining to another entity for the relationship from customer to address, we can observe that there is a 1 to 1 relationship as denoted by the two single bars alongside the address entry. We read this. As for any customer, there can only be one address. If we need to turn around and look at the entity relationship from address to customer, we can see that the entity relationship next to customer is denoted as a one to many relationship and is read as one address can have many customers moving on to the next example between address and city. The entity relationship from address to City is a 1 to 1 relationship Red. As for any address, there can only be one city reviewing the relationship in the reverse direction. From city to address, we can say that a city can have many addresses for our last example. We review the entity relationship between city and country. We can see that the relationship between a city to a country is a 1 to 1 relationship and in the opposite direction from country to city. We read this as a country can have many cities. Remember that when describing in an entity, relationship replaced the crow's foot notation at the destination of the entity. Hence, for first example of describing the entity relationship between customer and address, where a customer can only have one address, place the crow's foot notation of two parallel bars at the destination, which is the address. This could be a little confusing to start with, as we are having to describe relationships in two given directions, so take your time to review each relationship as you build. Aren't your entity relationship diagrams on this should become easier in time, with more practice being able to describe such entity relationships enforces the rules of high data is stored in our database tables. This will become even more apparent when we move into SQL coding as if we try to force an engineer relationship that does not meet the rules of our table design. This will throw an exception error. For example, if we try to add a second address for a customer into the address table, this will throw up a violation error, saying that we can only have one address for any given customer in a table. However, if it was the case that a customer should have, many addresses are could have many addresses. We need to identify this in the design stage before moving on with any physical implementation in a database you will find Ngeny relationship diagrams are commonplace across a number of different teams, such as the database administrators, data analysts as well as software developers, as they show the general rules of hard data and tracks between any two given entities in our application. That's it for entity relationships in this section and as discussed earlier, we will see a lot more of entity relationship diagrams when we movinto our SQL coding section 10. Extract Transform and Load Data (ETL): In this section, we will review the detailed process, extract, transform and loading of data into a data warehouse. For the most part of working as a business analyst, you're likely to be utilizing tables that have already been created by a team of database administrators from various disparate source systems. A desperate source system implies that the systems are not related to each other. You will also hear the term E t. L commonly referenced as the batch in context. You may have read emails relating to the daily, weekly and monthly ET L batch status, which would provide a business status update on how batch jobs round over the weekend are at the end of month. Batch in this case implies that there are a number of concurrent parallel e t l scheduled jobs which are processing the source system data. Let's walk through a typical ET l process and see how each part relates and potential issues we can face along the way. In the first swim lane entitled Source Systems, our business has dated that they captured from three desperate, known related sources. The first step in our ET al process will be to connect to thes source systems to capture any new data. And typically such data captures will be happening on a daily basis and in some cases even on an hourly basis, depending on your data needs. In our process, we are connecting first to a customer support phone and email system, followed by a connection to Attard Party data vendor who we are capturing additional customer and merchant information from and finally our Web application, which hosts, for example, R e commerce website. The first step will be to connect to and extract the data from each system, and it may be the case that to extracted Asian from each of these systems, we will utilize different software tools. For example, when connecting to our customer support system, which has hosted internally on our servers and is also running on in my SQL database similar to our data warehouse, we're able to connect with these to the database and extract required information to be brought into our data warehouse. You might ask Then why do we want to extract data from a database to be brought into another database such a czar Data warehouse? If they're both running on my SQL, why not just connect directly. Typically, the answer is due to security on database administrators will limit use their access to source systems by having them go through a data warehouse. In all the years of being working as a business analyst, there have only been a few cases where I would have required direct access to a source system. Under the majority of cases, I will access company data from a single data warehouse that has managed and maintained by a team of database administrators. Such cases where I've had to have direct access to a source system, is where I would have bean performing a data migration into our data warehouse. And as a subject matter expert, I would have also being acting as a database administrator with higher privileges than a standard business analyst. For administration purposes, it makes sense for a team of database administrators to only have to manage database user accounts from a single source. Capturing data from source systems will then only require having to set up one or two users with administration privileges to access such source systems. Database administrators who are looking to access these systems conduce so securely knowing that there is only one or two users that gonna access such systems, which limits the number of vulnerabilities on such systems that would be vital for business . For example, we wouldn't want to business analysts to be able to directly access the payment system, which could leave her payment system open to attacks from hackers if they managed to infiltrate our business. Analysts workstation. Such systems would also have a number of authentication methods put in place, such as first having to dial into a secure VPN virtual private network, provide a one time token password critical systems, which can be easily monitored by our database administrators and security team to ensure that only a few limited accounts are accessing such delicate systems. Next, our E. T. O is connecting to Attard Party Day defender, who has provided access to their system by means of an A P I application programming interface. For this, we have written a number of scripts and python to negotiate with the third party data. For example, we have functionality that allies is to log in with secure details and generator Valent token for access. We also need to take into account how many records we can request from the system so as not to overpower the system, which could be considered as a DDOS attack. Our denial of service, which implies we have bombarded the system with too many requests that the system starts failing to process our requests and anybody else connecting to the system. Therefore, we have a throttle back function which requests for 1000 Jews their records at a time. And when we have processed these one ties and we send a new request asking for the next 1000 records, we might also include that each record can only be requested within a certain amount of time, for example, two or three seconds apart from each request. Our third party data vendor has also made it possible to retrieve our data in a number of formats such as Jason XML and flat files in the form of CSP or TXT. We decided to process the files and Jason Format, which also provides us with the data types for each field of information. Lastly, reconnect to our Web server, which is running on Apache, and to capture Weblogs and transaction logs of our customers. These are stored in flat files on our Web application that are easily read from our Web server using secure FTP sftp. All three systems now have had their data loaded into a staging server. On the next stage is to start transforming the data into meaningful information. For example, as noted that each of these systems are disparate. We need to January global customer keys that can allow the data to be related to each other . Next, we need to process time stamps and corresponding time zone information to have all our data relate to one time zone or corresponding time zones of business hours. For example, if our servers are being hosted in California with time stamps for Pacific meantime, but we want to query transactions that were processed in the United Kingdom with Greenwich mean time, then we need to cast our time stumps accordingly. If we didn't cast her time stamps to the correct time zone, then we would have to manually calculate each transactions Time zone and timestamp, which could be very costly. As an example. If we ran a campaign in the UK over the weekend and wanted to check out early performance to see when the campaign was at its peak, at the time stamp date is in PST time zone, then we're seven. Our results from Greenwich mean time. A transaction that took place at 1 p.m. In the UK would be recorded on our server as seven AM PST. Other transformations were likely to consider our currency and foreign exchange. Our system is recording data and all currencies from around the world. But for reporting purposes, we need to process an overall report in U. S. Dollars, but also corresponding reports in local currency. While it is possible to make the conversions at the time of our analysis are DBS have taken the time to set up corresponding fact tables with relevant currencies of conversion such as Euro, sterling, Japanese yen and Chinese one. Once we have completed all transformations of our date of the next step will be to start loading this data into our data. Warehouse tables, depending on the needs of your business, will depend on how often you run your batch e t l jobs, and at any given time you may have a number of concurrent parallel jobs running on your system, especially when it comes to transform Data's noted. We may have a number of different aggregate tables, which need processing, such as on a daily, weekly and monthly basis, often times you will find where end of month processing starts to run at the beginning of the next new month. This may impact the first few days of a daily and weekly batch processing. Where are monthly transformations are having to look back on average of 30 days worth of data on these jobs may take more than 24 48 hours to fully process, especially if you are running a large global company with millions of transactions on a daily basis. This is where we need to identify abounds with our database processing under our business needs. You will also find that where ET L is processing jobs with source stated being brought into our company's data warehouse. Often times Each department will also have their own set of batch jobs, which is processing data for their own. Local reports on these, in turn, will get held up if you have a dependency on a source system, job running and also you may have dependencies on other teams. Batch jobs processing. Finally, we discussed about aggregate tables in our previous section. Such tables typically come a bank where database administrators will monitor frequently, asked Cleary's by data analysts within the data warehouse on Based on these queries, they can then formulate new aggregate tables to facilitate such data analysis, which will reduce the processing load on the data warehouse, where such expensive queries only need be processed once by the E. T L batch, and results can then be utilized by the business and analyst teams alike. As a business analyst, you will find yourself most of the time working with aggregate tables if you're building at dashboards and reports for the business to illustrate key performance metrics and will typically only revert back to using fact tables if the business has identified an issue with the Coari KP I on a given day or for a given campaign. For example, if the business noted a large spike in transactions mid week and cannot identify what has caused the spike, then we need to go back to a more granular level, such as at the factor table level to try and identify the cause for the spike, which might be anything from a once off large transaction from a single customer to an anomaly in payments such as large refund are charged back being processed to summarize batch e T. L can be set up to run core SQL jobs for extraction transformation. Unloading of data for desired business intervals such as daily, weekly and monthly extraction of data can vary from source system to source system and the tools used for extracting the data transformation of data such as dates between time zone multiple currencies and creating global customer ideas between desperate systems. Help a business to consolidate their data sources into a single data warehouse. Having a single data warehouse insurers, tight security and ease of administration for a database administration team. While there is an overall batch maintained by database administration teams, it is also likely that many other departments will have their own ET l batch jobs, which the team business analyst will maintain delays and loading. Batch data should be communicated amongst teams and can take time to resolve if there are dependencies on data from other teams are systems maintaining large batch systems could be quite the juggling job as a failure, and one batch can impact many areas not too dissimilar to roadworks. If the issue is local than only a few users may be impacted. Whereas if the failure is out of source system, this could be the equivalent of having a section of motorway closed, which will impact many users and can delay projects. That's all for this section on E T. L. 11. MySQL Server Setup: In this section, we will download and install the my SQL Server and run through the configuration process. First, open up a Web riser and typing download my SQL server. In this instance, I'm going to select for Windows 10 but you can also select downloads for Lennox and Mac OS . The first link will take us to the my SQL webpage, where we confined the installer download. Select the first option, and to begin your download, you will need to log in our sign up for a my SQL account. Once you've created an account, you can begin the download. This may take a couple of minutes, depending on your Internet connections. Be when the fire has completed downloading. Click on the installer to begin the installation process. Choose the developer. Default Set up. This again will take a few minutes, depending on your Internet connection. Speed one stand loaded. We will go through the my SQL Server configuration, select classic replication to start with and default settings for the network ports. For authentication. We would use the legacy authentication method. Next, provide a route account password. I would suggest using a simple password for development purposes are password that you use regularly. In addition to the root account, we will also need to create a user account, which could be your first name, provide a password for the user, caned and also sent yourself as a DB a database administrator. Click on next to create the account. Next, we will set up the Windows Service, and we can ignore the plug ins and extensions. Finally, click Execute. To apply the configuration, click finish to finish the configuration and click next to configure the My SQL router, which we can set to the default. Next, we will connect to the server as a test under user route. Provide the passwords that we created earlier. Click Check and we can see if we have a successful connection and execute. Apply the configuration. This completes the installation of the my SQL Server. We can go ahead and untech the shell option and open the my SQL workbench editor. Once the editor is opened, we can log in to the my SQL Server with our root password, click save password to vote so it remembers the password for the next time to check the server is up and running under management in the top left. We can look at the my SQL server status. We can see the server is indeed running. This concludes a my SQL server set up in configuration and I'll see you in the next section . 12. MySQL Server Start and Stop Service: in this section, we will look at how to start and stop the my SQL service. First open Windows Task Manager with the shortcut command control shift escape under services, we can look for my SQL eight as well as the my SQL router. Right click on my SQL eight and click Stop. This will take a second or two to complete and under stages. This should now be showing us stopped. We cannot do the same for the my SQL router. In addition, we can go in and set high the my SQL service starts up when we start our computer on. Typically as a developer, we don't want to have my SQL service automatically starting because it can take up resources in the background. Scroll down into my SQL eight right click and few the properties and set to manual click apply. And okay, once we have sent their preferences for my SQL eight on the my SQL router, we can close out of services on the next time you start up your computer, my SQL eight on the corresponding Rougier will not be taking up any resources. However, any time you want to perform some analysis in my SQL. You will need to go back into task Manager, click on Services and start my SQL eight on the my SQL router. That's all for this section on how to set the my SQL services. 13. Connect to database: in this section we will review had to connect to the Circulate a database which we installed as part of the My SQL Server set up. We will be referring to this database right the course and is a database I frequently referred to over the years for SQL Development. This sample database was initially created by my SQL in 2005 and hence is representative of a DVD rental store from the time containing detailed information relating to movies such as titles, the actors as well as customer and rental details. Although DVDs may be obsolete today, the model is still very valid in today's world of e commerce and similar movie rental sites and could be easily adopted to represent any number of online business models. Today on a personal note, one of the many jobs I had while studying for my degree in computer science was to work an evening and weekend shift at a DVD rental store on. Back then, the store was running on Windows 95 with a very rudimentary Windows access database. Nowadays we only need Click a button on our smart TV to access Netflix or Amazon video. But thanks to the likes of DVD rental stores. They were able to keep me in pocket While studying for my degree at university. My SQL is continued to develop the secular database over the years as a means to showcase their latest database software enhancements, making it a great resource to learn from, as well as a benchmarking tool for developers coming back over to the my sequel Workbench. If we log in first as root, are as a user that you've set up, we can see here there's a management to our a number of management tools on the left hand side on Dane on the bottom left. Here we have thes schemers that were installed, including the secular database, even though we are in my sequel, Workbench were not presently connected to any database. For this, we must connect to the my Secret server and issue the command used database to connect to the database. We will first go to database and connect to database, so in this instance we will be asked for details. But as we are connecting as Rusian, we have the password already stored in the vault. We can just go ahead and hit OK in addition, we can also set the default schemer. If we were working with this regularly, we could put in Secular in this section here. For now, I'm just going to leave everything as is on to connect into the database. If we look again on the left hand side, even though we've connected to the my sequel server, we still haven't connected to any schemer. To do this, we must issue the command used database where database will be the name secular. In this instance, we can go ahead and then execute this command by either clicking on the icon here that looks like a bolt of lightning are. My personal preference is to use the keyboard shortcut, which will be controlled. Shift on return once we've executed this commander under scheme is that secular has highlighted as bold, indicating that we're connected from here. We can also see a number of different tabs have appeared where we can see a tab for tables , views, stored procedures and functions. In this instance, my sequel has queried the system tables for secular to identify what tables are here, and we can click it on to any of thes tables and further review what columns are Indexes are in the table. Typically when I connect to a database. If I do not have such a visual tool to be up to show me what tables are in the database, I will use another command, which is called show tables again. If I want to execute this and just this command, I'll first highlight the command on that. I can click on the execute executing this command. We can now see the list of tables that we have brought up in the result grade is matching that which was, in this scheme, a profile on the right. Thes two functions are specific to my sequel, and if you are running on a different database, platforms such as Post Greste, Terry Tate are our oracle. I would suggest to look up water, the corresponding commands or tables where you can find this information. For example, in Oracle, you can query the table db a. Underscore tables to find a list of tables are db a underscore columns If you want to query columns associated with the table while in terror data, there is a specific DB, a database containing tables of table names and column names Knowing how to identify what tables and columns are in a database is half the battle of finding relevant data you will need for your project to summarize when connecting to a database. We can use the command used database providing the database name we want to connect to to show a list of tables in the database we are connected to weaken. Issue the command show tables, which will return a list of non temporal tables in the database for more details of the history and overall architecture of the secular date. Aries you can visit the link provided and at any time. If you want to know more details about a particular my sequel Command, you can refer to the My sequel Documentation in My Secret Workbench under the help menu are by pressing the F one key, which will open the my sequel documentation. For example, if I want you to look up the command used database, we can see a list of search results has come up here creating and using a database. This will also become a very useful tool on getting familiar with the My sequel Documentation are any documentation for database is a must have for a developer. That's it for now, and I'll see you in the next section 14. Comments in SQL: in this section we will review had to write and use comments within our SQL code. It's considered good programing practice to provide comments within your coat, in particular, if you were working within two team, where other analysts may also be required to update our make amendments to your code or vice versa. Or maybe you've been asked to update our fix code that is, in production that was originally written by an analyst who is no longer on your team. But as they have provided clear comments, you fully understand their code and what changes need to be made. In addition, as you progress with your SQL coding skills, you will find files of code will grow increasingly larger. So having comments that you understand and can follow will help you in the long room. Another good coding convention is to start your SQL scripts by providing some details about the code, such as the name of the code that will be used in production, high level description of what the code does, who requested the code, the author of the code, the date the code was created and any version details. If this is a new request that you have just being assigned and you have sat down with the business owner in the meantime to identifying the project requirements, Then I will typically use the project design document on draft out of high level plan of what needs to be coded. This is often referred to a studio code, our Fate code, that describes the steps on logic that you will need to perform in your script. You may also want to give reference in your SQL script back to the original design document . So when you have completed your code, you can then provide the business with a high level walk through of the code. If required. For example, a scenario of review might be in the first section is where we identify the base of customers for analysis and the criteria that was agreed at home. Under the next section, we used the list of customers in Section One to identify what volume of transactions they did for the last 12 months. Also to note that for all comments you write in SQL, the SQL Interpreter or SQL engine will disregard your comments when executing so they don't take up any valuable resources. However, it is good practice to execute your code with the comments, as you can then search back on your execution history at a later stage by searching for those comments. If you only ever execute SQL code without comments, this could be very difficult to search code that has been written to your SQL history table . Let's now take a look at how we can build a header within our SQL code using a common block . A common block, as you can see, begins with a Ford Slash, followed by an Asterix and ends with an Asterix followed by Afford Slash. Common block could be hidden by clicking on the minus Sign. Next to the comment are revealed back to the analyst. By clicking on the plus sign, most text editors will have the ability to hide or show all comments within a script, which could be very useful if you're coding on a small screen. If my script has grown quite large, for example, more than 100 lines of code or even 1000 lines of code, then I will typically also add in a table of contents to the header on number each of the sections of code. As you can see, I've labeled my sections of code. Here is one and two. The next type of comment is known as an in line comment and can be used to accommodate a single line of code at a time are to place a comment at the end of a line of code. An in line comment could be written using either a double hyphen R with a single hash mark . My preference is to use a double hyphen, as this is the comment standard dates back to some of the first program languages, such as D and C plus plus, while the hash comment came along with later programming languages such as PHP and Python. Ideally, you want to do a quick check on the database you are using to see which comment types are allowed. Obviously, there is a balance for everything, so we'll leave that to your own judgment as to how many comments you add into your code in time. But as a rule of thumb, ask yourself if someone on your team was reading your code, would they be able to understand your comments? You may want to have a teammate review your code to see if your comments make sense. Obviously, you don't want to put anyone on your team in a situation where they need to update your code for critical reasons but can't understand your comments or even worse is they can't understand your code as you have used very cryptic functions to summarize comments make reading code easier for everyone on your team. They could be used to highlight sections of code and can speed up development time tenfold , where they reduce the amount of time it will take for an analyst. Our developer to understand the code. That's all for now on. I'll see you in the next section. 15. Selecting data from a table: In this section, we will review how to select data from a table, our view in a database, how to review a table stated by returning a sample set of data as well as identifying what columns are present in a table. Once we have an idea of the tables, data and columns, we can then start to refine a results to meet our specific project requirements. And lastly, we will narrow down our results using a combination of criteria applied in the where clause using fixed and fuzzy logic in general, every select query written in SQL can be broken down into the following components, as I've shown here on the right inside. Don't worry if this syntax looks a little alien at first, and once we have gone through a few live coding examples, you will see that the process is not too dissimilar to that of making an order in a deli or coffee shop for your lunch. Now let's move on and take a look at a few life coding examples with the secular database. So let's get started with the life coding example here. I've laid out my header with the typical sections, such as the details about the project that I'm going to be working on. Andi. I've laid out a table of contents here to show what queries I will be running if needs me, I can minimize the header or the block comment using the plus and minus as discussed earlier, I've copied down my table of contents to the section below and made these as in line comments. For the first query, we will look at selecting all data from a table using the Asterix character. The Asterix character is a special character within SQL that denotes all columns we type in Select Star from and we can open up on the left here secular and look at the actor table we can either type and actor are another way to bring the table across is to actually double click on the name actor, and this will show on the right hand side. Next, we will execute command by using the shortcut control shift return. And as you can see, this is actually returned an error message and their message basically reading as no database selected. I've deliberately done this to show how you can actually work. Arranged this error by providing the day debates name. And as you can see in my SQL workbench, if I type the name, the date race proceeded by a duct it presents with the list of tables. If I read this command again, you can see it does indeed execute. So this is one way of actually getting around. Having to type in use database every time are, in this case, use secular. This will become more apparent later on when we start working with multiple databases. On preference is sometimes to give the database name within your code rather than asking it to use a database by default. When this scenario, we will go ahead and run the command, use Dracula and just to confirm that it has run, we can open up our I have put window, which shows us how how the commands have run. And if they've run successfully or not, we can see here with the green tick that these have run successful in the 1st 1 Didn't run successful, can hide this off until anymore Error messages appear we can remove secular from the name of the table as we will be continuing to use this database for these examples. Let's take a look at the output again, re execute. But as you can see, this is provided the output of the table with actor I. D. First name last name on the last update date. If we have more than enough data to look at for a sample, weaken in the next section. Look at using the limit Key word to just pull back a sample of data we can copy down in the same query from a both. But this time, as opposed to returning old 200 Rose, we can say limit 10 and as you can see, it's returned. The 1st 10 rose from the actor table for the next example. We look at how we can specify the column names in a table, and in this scenario, we may be just looking to understand what's in the table on. We've identified that we want to the actor I d the first name and the last name only. So in this scenario, as we can see, we've got an actor I D. That is unique to each actor. Once we have selected the columns that we want to use, we can go ahead and rerun de Command, and we can see that this has dropped off the last update date and only provided us with the actor I. D. First name last name for the next example, we will introduce the wear clothes I'm curious to see. Is there an actor in the database that has a first name similar to myself on To do this? I can say where first name equals Jeffrey, and as it's a text string, I need to place that between quotes. If this was a numeric values such as actor I D, I wouldn't need to place this in quotes. I could say, for example, actor I d equals one without placing it in quotes. Let's go ahead and run this. Then we can see there's an actor Jeffrey Heston, who has an actor i d. Of 151 That's a very simple example of using the wear clothes who wanted to look at somebody with the last name of Jeffrey on. This can happen where, in some scenarios you've accidentally selected the wrong field with this select statement I've run. It's returned the three columns that I requested Act variety for his name, last name, but there are no rows of data. As you can see, there is no index value. But the values that are coming back are showing is no A no is a special reserved word in SQL and typically in a lot of programming languages, which means that there was no data found. This is not to be mistaken with the space character, which could be an empty string that is, Actually somebody has typed in using spaces. If that was the case, then no one would not be showing here. It would be actually fully white. So you need to be careful when searching has some were put in blank. And as we know that there is somebody with the last name Heston, we can go ahead and run this and that corrects our query for the next example just to expand a little bit on using first name to look at for any actors that have names beginning with G. And instead of saying equal, which is an absolute logic statement, we can change this to be like, which is what we call a fuzzy statement are fuzzy logic. Now we can't just type in G on its own. We have to specify where we wouldn't have the character our word find. So we could say, for example, Jeff followed by any character which we denote with the percentage selling are We might have 2% at the beginning, Richard to say, find any words followed by the name Jeff. And lastly, with the like we can say Find any string which, as Jeff, somewhere in the name beginning and end for this example we want to know. Are there any other actors who have a name? Begin witchy. We can go ahead and run, and as you can see, we've I had a number of actors return to expand on the where clause will now move over to the address table. And one way of identifying what's in a table without returning any date if we just wanted to see the columns is to actually use the limit. Keyword on weaken, Say limit zero. This will return no rose, but the column names instead. And as you can see, it's returned an error where I said Lim equal to zero where this is actually limits should be set to zero again. Good to check that you're reading your error messages. In this case, it's saying you have an error in your secret syntax. Check the manual are the corresponding. And if we hover over, it tells us right at the end that this is an error near the value equal to zero at Line three in our code. So it's pointing directly back up to our code to say, on line 123 something here near zero was impacting our query. And in this case I've written equals zero by removing that it will execute correctly. So here we could see the columns that are within the address table. And as it might want to see some of the sample data, I'm going to do a limit of 10. But as you can see here, there are a number of addresses where no value was given in address to, and they're represented as no and some addresses that closely put in a space or a blank character where there is no no. But there is some data represented in the column here as an example of a wear clothes. I'm going to look for where the district equals California, and if we don't want to type of California in case we have some spelling mistakes are this , for example, might be a product name or i d are very long spring weaken, select on the cell and used the standard control see to copy and control of the to paste in here are you can right click on and do copy field and paste back up into the screen. In this instance of the second a database, the tables are not case sensitive, but in some databases you will find that case sensitivity has bean explicitly turned on or even for a given table. Case sensitivity has been set, so this is possible to send if, for example, we were searching for California with a lower case C. If the table was case sensitive than no data would return versus with it being case sensitive and it's expecting that California has a Capital C, then this would return to value if K sensitivity was turned on for that table. This is becoming increasingly popular with some very large retail merchants, our merchants in general, where they have a lot of descriptive information and they want to capture it exactly as it is shown. But in general, you will find If you're searching through a field, you can check by first. Pulling out a sample of the data and passing in a lower case are uppercase value just to check. So in this scenario, I can say select all from address where California with a lower case C and we can see that it does indeed, Return. This informs me that this table is not case sensitive or at least the field is not case sensitive. If you're ever in date as to why something may not be returning, I think is the field table our database potentially case sensitive. We've returned here. We can see that we have a number of customers in our secular database that are from the California area on. And we might want to limit this staying to actually look for just customers who are living in a specific address area. So we have District of California on weaken, say, and the dress. This is the address line one. It is like summer asking it that it contains the value we're gonna ask Teoh review for Avenue and we can put in just a ve part of the Term Avenue are we could put in all avenue that, as I know, that avenue sometimes is abbreviated to a ve Here we can see four dresses have come back where we've provided criteria of District California on the address field is like a V that concludes our live demo off reviewing the secular database with select and where clauses. 16. Table indexes: In this section, we will review how to select data from multiple tables by using various table joint types. First we were review. Her joins are made between tables by referencing primary foreign and secondary keys. Next we will look a table and column aliases, which help you speed up your coding by creating shorthand references to tables and columns . Finally, we will review four of the main SQL table joint types found in databases that I would regularly use on a daily basis. The inner join the left and right Eider joint union and cart eating joint type. As always, I would recommend that depending on the database you were using to review the database documentation to identify which table joints are available to you, let's now review a number of ways to tables could be joined together using SQL code. The first, in probably the most common way, is to use the primary and foreign key relationship with regards to data normalization. We reviewed how, by separating are grouped data components into separate dimensional tables. We could then reference the dimension table by means of a foreign key. For example, if we had a customer table, we might have a foreign key relationship to drawing to an address table by means of an address i d. Which weaken store in the customer table when we want to know the value of the customer address. Then we conjoined the customer table using the address I d to the address table to identify the address details of the customer. This type of relationship is known as a primary and foreign key relationship. Next, we have what is known as a unique key. Such values are typically found in transaction tables where unique value is generated to represent the entire transaction. For example, if we had a table called fact customer transaction, we could generate a unique I d to identify each transaction row on. Call this the transaction i d. Such unique keys may be in the form of an integer that is implemented by one every time a transaction is created. Such a transaction I D can be used to identify how many transactions have gone through a system. Next, we have the composite compound are can cut nated keys. These keys take the form of having multiple values to uniquely identify a given rule of data. For example, if we had our fact customer transaction table again. But we did not have a unique key in the form of a transaction I d. We could generate a composite key by combining a number of values together to a uniquely identify each transaction, such as the customer i d. The transaction date in time and the item number in the shopping basket. By combining these tree on more values together, we can uniquely identify our row of data from other rows of data. Another example of a composite key might be within an aggregate table. For example, if we had a transaction summary table called Egg Transaction Country Month, which is a summary of transactions by country on a monthly basis, then we can make a composite or compound key by using the values of country I D and transaction month and year. This also identifies for us the level of granularity in the table, However, don't always rely on the table name to identify the level of granularity. For example, we might have a table called AG Country Transaction month, which leads us to believe that the primary key is made up of country month and year, but when we review the primary key, we also identify that there is a product and so product fields within the table and key index. This would mean that each row is uniquely identified, a country month and year and in addition, product I D. And so product I d. When performing any aggregate calculations, are selecting data from a table. We need to be aware of what is the granularity in the table, as this can cause double counting of data in particular, If we're trying to uniquely identify the number of customers for a given period to give an example of double counting, let's say we have a value of 10 ties and customers for the month of January 2018 in the United Kingdom who made transactions. Our table name of AG country Transaction month leads us to believe that the granularity of the table is that country and month low. We then form a kind of customers and returns account of 200,000 customers who made transactions in January 2018 Upon further investigation of the table, we identified that the table also contains product and so product details, of which there are 20 rows of data for the month of January in the United Kingdom, the value of 200,000 is made up of our 10,000 initial customers times the 20 rows of product details, giving us a value of 200,000. To correct this, we would perform a can't distinct to provide us with the correct figure of 10 times and transactions or customers are we could use an alternative table that does not contain product and so product details. Next, we have secondary keys. While secondary keys air typically non unique, they could be used to help identify a row of information. For example, values such as first name last name, salary department country can help us to identify a given road. And for such keys, weaken. Gather what is known as statistics on these values to help us with our queries within the data warehouse, for example, with country, we can collect statistics on this field, which will identify the unique number of countries within our data. For example, if our company had business within five countries by collecting such statistics to identify how many countries were doing business in a database, administrator can then look to partition out the data by country, which will allow for much faster query returns of data. If we're specifically asking for individual countries on the data, is then stored in a specific location in our database, you will often see secondary keys utilized within aggregate. Tables are not uniquely identified by the secondary key. Having such statistics and data distribution will help us greatly with enhancing the performance of our database. Finally, we have super keys, which are similar to compound complicit keys but are made up of a number of different permutations of key index values. For example, we might have transaction I D as a unique primary key, followed by a super key of transaction I D. Plus customer name, followed by another super key of transaction, I D. Plus the Address I D. And the final super key of first name and last name. Each of these Super Keys will help us with the distribution or data within a table, which again helps with the performance of a database on the time it takes to return data back from a query. Such super keys may be set up to allow for joints to additional tables that support the Super keys and don't necessarily have the likes of a transaction i d to join on. Next we will look at table and column aliases. 17. Table and column aliases: next we will review table and column a nieces and how they help us to speed up our coding by reducing the marriage of typing required the table aliases, a short name or acronym that could be provided to a table, which we then use any time we want to reference that table in our code. For example, if we had a table called dim customer, we could use the alias See us cause for customer. Alternatively, if you find it too difficult to come up with a new atheists for a table, then my preference is to use a standard eighties in the form of A for a lists followed by a number. Typically, I started 10 and increment by one. Every time I had a new table to my SQL code, for example, I would have a 10 followed by a 11 followed by a 12. I find this notation to be very clear when reviewing code. In the past, I've worked with companies who were trying maintain a document of table aliases commonly used in the company, which after a period of time, became difficult to maintain, do through the large number of similar table aliases being used in the data warehouse on mistakes started to creep in. For example, the table customer can't relationship would have the same acronym as customer can't returns c a r. By keeping a simple notation such as a 10 11 and 12 then we don't need to worry about maintaining a document of table aliases. At best, I would always recommend that you provide a table alias with a minimum of three characters . While it is possible to provide an alias of a single character to a table again, this could be difficult to search for in a large piece of code. If you've only referenced your table Alias as a single better. For example, see for customer our X y zed. If we're doing a search for the letter C, this would obviously highlight every select statement and a number of other values in our code. Whereas if we have an alias of at least three characters long, this should be enough to allow for a unique search within our code. Also to note that aliases used local scope and therefore an alias could be used multiple times throughout a section of code. But maybe referencing a different table each time. For example, if I had tree separate, select statements, select from customer, select from address and select from order, we could use a table alias of a 10 for all three tables, and they were being unique to each select statement column. Aliases are used to change column display output for a given field with our example, though you can see we have a simple query selecting data from the customer table and on the right. We have the displayed output as identified with a number of different column aliases. The first alias I've used here is to change the column name from first underscored name to be represented as first space name, which will be more conducive for our reporting. Next, I've changed the fields last name to be output in our report concerning and for this you can see I've not used a single quote, as I did with the previous column. This is because in the previous column first name, we need to specify the value between quotes as there is a space between the two values. If we tried running our SQL code without quotes, it will return and error as it cannot distinguish between the two values First and name goodbye place in the two values between single quotes. The SQL engine will reference this as a single value. Next, I have email, and I changed the I put here to be email underscore Address. Using an underscore is an alternative to using single quotes where I've replaced this space with an underscore value. My personal preference is to use underscore instead of spaces, as if we are reading our data into a new report such as Microsoft Excel. There are occasions where Microsoft Excel may be confused by having to values for a field name, But as I've used an underscore with email address alias Microsoft Excel will read this as one word. Personal preferences to always used a limiter, which is something other than a space such as a tab for a txt file are a comma for a CS fee file. And finally, I've added the value says State and named this. As Today says, State is a function within my SQL, which will return the current date or the system date on time. As you can see, when I executed the command, it was on the seventh of July at 2:45 p.m. This is a useful command to have. If you ever need to time, stamp your data with the time of execution. This is typically don't when inserting a row of data into a table and we can identify when the road was last updated are modified using the function cyst age to give us today's date . That's all for this section on table and column aliases. 18. Inner Join Tables: Let's now review table joints, starting with the inner join, which is probably the most common joined used. When coding SQL, the inner join will return components that are coming to two tables and is generally used when referencing dimensional are look up tables. On the right is an illustration of an inner join using Venn diagram notation in ven diagrams. Such a joint would be called an intersection between two cents. That is, components from, said A, which are also present in set B for a table joint. We will join two tables by referencing their index keys in the joint. Both we can see this is a primary foreign key joint where we have a foreign key of address i D. In the customer table, which joins to the address table on the corresponding primary key address. I d also note in the select statement that we have referenced both the customer and address table by their table. Aliases see us and 80 are for customer. We are returning only the customer i d. And for address we are returning all values as indicated by the Asterix Value. Asterix is an SQL term to imply all values from a table. The resulting I put is the table below, showing us our customer I d from the customer table, followed by all the address fields from the address table. However, if I only wanted to select a few of the fields from the address table, I can do so by specifically providing the field names and, if needs be, Iike, renamed the output of these columns by providing column aliases, for example, we might want to change the field address to be address Line one and address to to be addressed line to that's all for the inner join and next we will review the left and right over joined. 19. Left and Right Outer Join Tables: next we have the left and right joint also known as left. As we join our right after joined in the Venn diagram on the right, you can see a left. Join will bring back all the data from Set A and are any data from set B where the joint type is met for right joint. We're reversing which tables two stars are joined from and the right joint. We will bring back all values from the scent be under any matching data from Saturday that matches the joint condition for our SQL statement to perform a left join, we use similar syntax as an inner join. But we replace the term inner join with left or right join depending on the joint we want to use. In this scenario, we're asking to bring back all the customer ideas from the customer table and any address information that is available from the address table. The resulting output will be a full list of our customer ideas from the customer table and corresponding address I ds from the address table where the joint criteria was met. If a customer does not have a valid address, i d unlike the inner joint where the customer would be omitted as we were using a left join weaken still, display the customer i D and present the address details in the form of Knowles, where no information was famed. For example, customer I D 38 39 40 have no address information and where no data is present, a no value is displayed. In this case, the no value is represented as a question mark. Depending on your SQL editor, you can change how no values are represented by default. The My SQL Workbench editor The value no will be written out as N u l l know. But you can change this if you like in the settings menu to be represented as a question mark or any other value. But you so choose. If we had performed an inner join with the SQL statement, then customer 38 39 40 would have been omitted from our results as an inner joint will only return results where there is a positive match on our joint keys. Obviously, the desired output of your results is dependent on what you're trying to convey with your query. In our example of both. The analyst wanted to return a full list of customer I DS and identify where there was missing address information. If we only want to return customers where there was no address information, we can specify in the where clause where address I D is no. And this would return only the customers 38 39 14. This type of Korea could be very useful when we need to clean up a data set of information by identifying where no data was presented. You may often hear of analysts complaining that a data set is not clean, which often times implies that there is a lot of empty values. Are Knowles data integrity should be part of your database and software design, and if there is a possibility of having a no value, then we should try and specify a default value, instead indicates of our customer address for the secular DVD rental. A rule has also being implemented to enforce stage integrity between the customer and address table, such as it is not possible to have a customer without an address. This rule would also need to be implemented in the software design of our application and additional rules, such as If we delete a customer, then we also need to delete the customers address from the corresponding address table. This will be enforced in the secular database by means of a cascading delete statement. That is, if we delete a customer from the customer table, this will trigger an event to delete the corresponding customer address details. That's all for left and right after joints. 20. Union (all) Results: the union Our union All statement are used to join two sets of data together no drawing criteria as specified here. But we do require that boat sense have the same number of output fields and should also be of the same data types. The Venn diagram on the right shows seven a union set B where we return all details from boats sense as highlighted in green in our code. Example. On the left, The first SQL statement is to identify customers with an address in California that are active. The second SQL statement is to identify customers with the dresses that are not active on by using a union statement we conjoined. These two data sets together by using the union statement. The resulting data will remove any duplicate rose, whereas if we use the union all statement, this would return duplicate rose in our output from boats sense by default. Tables and SQL typically don't record duplicate rose. So if you're trying to implement a union, all statement and output duplicate rose into a table, then you will need to create a table that allies for duplicate rose In our first SQL statement. We want to return active customers in California active in this case is represented by a Boolean value of one are zero. The Boolean value implies a true or false statement, and in SQL it could be represented in many different ways. In our scenario, we have a single integer value of one for true and zero for false, which you may also see the use of a single character of y for yes, are true and end for know are false. There are many scenarios where we will want to join two or more sets of data together. Another example might be if we have been running a number of weekly marketing campaigns on . We don't want to send out marketing information to the same customers for any of the subsequent weeks. We would then union the marketing campaign customers from the previous week's runs into a single table, which we can then use to exclude from our next set of customer campaigns. Let's take a closer look at what's going on in this SQL statement knows that we cannot union more than two tables together. So in this scenario, I have First Union two tables, Campaign Week one and Campaign Week two together, which we can then combine with a second union statement with campaign Week three. The resulting list of customers can then be used in our main SQL statement as a sub query within our wear calls to perform an exclusion of customers. In this scenario, we're saying Where customer i d from our campaign table is not in our so query of customers from campaigns week 12 and three that we have defined within our union statement. To summarize the union statement provides us with a very powerful tool to allow us to combine tables on data sets together where we may not ordinarily have a common joint type. Between two tables are the resulting data maybe from two disparate source systems, which we need to combine together as part of our analysis. That's all for the union statement, and I'll see you in the next section 21. Project Management (Agile Methodology): in this section, I will provide an overview of how I project manage my work on a daily basis so as to ensure that projects I take on get delivered on time and meet my customers requirements. I quickly learned when I started I was as an analyst that saying yes, all the time to projects come quickly snowballed into a situation where you have taken on more work than you can actually manage. As much as I wanted to be able to please my team and felt if I didn't take on the work, I would be letting my team Dane and there was more than capable of producing the work. I hadn't factored into consideration the time it would take to deliver on multiple projects , and something had to change. Sitting down with my team manager at the time, he was also frustrated that he was being put in a position to try and deliver on. Multiple projects without the necessary resources are headcount, and you soon quickly realize the project management becomes a balancing act for everyone off the levels of management. In my early days of development, the typical methodology adopted by many teams was known as the waterfall lifecycle, which entails defining a set of requirements with your customer providing a delivery date of when you believe you should be able to get the project ready, and then embarking on a development process and getting back to the customer only when you have completed the project. One of the biggest shortcomings of the waterfall life cycle is the lack of interaction with your customer during development and in the majority of cases when it comes to delivering your projects. If your customer has not being involved in the development process to wrecked and had the means to make adjustments to requirements, then the final project at come may not necessarily meet the customer's expectations. And requirements that were sags potentially months ago have subsequently changed to to the nature of the business and some companies that I worked for. They also required that you went to long lens to produce very complex requirements documentation, which in some cases would take more time to document than to actually deliver on your application or project. While this may be necessary in some industries, that requirements are airtight. In the majority of cases for the projects, I was working on this was not necessary and it was considered by many on the team as a time wasting exercise. Striking a balance with your customer and your team expectations is vital if you want your project to succeed. So for the last 10 years, my chosen development methodology of choice has been to utilize what is called agile development. With Angela Development development, life cycle is broken down into smaller, more manageable units of work. While there are a number of ways to implement agile development, Scrum has become one of the more popular methods for self for development, with a set of rules assigned responsibilities amongst the team and meetings set to take place often on a daily basis with the development team. A single liberation of the development life cycle in agile is often referred to as a sprint and typically lasts anywhere between one week toe. One month, a screw master similar to a project manager will oversee development efforts. I will work with 13 to identifying for any given sprint period. How many items of development are to be delivered within a sprint at the end of each sprint , the team will come together with their scrum master and in some cases, a member of the business Interview Development efforts for the given sprint for any components that were not successfully completed, feedback will be taken to identify what was the hold of for development on that piece of development can be placed back into the backlog of requirements for later consideration. The sprint will then be signed off by the scrum master and a member of the business and marked as complete completed pieces of work will entail not just development efforts, but also that testing has been carried out on the piece of work is fit for use in production. This process is also sometimes referred to his shrink wrapping as completed sprints of work do not require any further interaction from the development team. If a project is ever called into question by the business are a business owner, then a scrum Master Rubio to provide details of what percentage of development is fully complete on what work is still outstanding, sometimes demonstrated in what is known as a burn down chart in the event that a project needs to be shelved, that is to make room for another piece of work or to be cancelled out right that business owners can work with the scrum master to come up with a realistic forecast for development and identify which projects air closer to completion under worth continuing investment in. We're in the past using more traditional methodologies such as the waterfall life cycle. It was very difficult to understand where a project was in terms of development and what was ready to go into production. This was largely due to the fact that testing would be left to the very end of development on more focus was given to development rather than ensuring that requirements still met the businesses needs on that project, deliverables were coming out of development, ready for production with agile development and running sprints. Teams will typically youth noise what are called kandun boards to show the progress of work . Canada on board utilizes a series of vertical swim lanes and can often be seen around offices in the form of post it notes stuck toe whiteboards, where requirements will be placed in the first lane often referred to as the backlog and subsequent lanes moving from left to right will follow the development process with right most lane being the completion and sign off lane and lanes in between will depend on the number of stages of development your team has adopted on any given post. It note will typically have the requirement name, the given priority and the initials of the developer taking on the task. A screw master can see at a glance from the canvassing board just how much work has been taken on by each member of 13 and consent constraints on development such as anyone. Developer may not take on more than three items of work during a given sprint and for a developer to take on any more work, they must have their work fully signed off by the business before accepting any further work. Just recognizing just how much work each member of the development team is carrying out and ensuring that a team, our individual team member, is not taking on more work than they can actually handle. When reviewing combine boards, a scrum master will typically start with the right most lane of items and development. That is, the scrum master is trying to identify what items air closer to completion before considering any further items to be brought into a sprint. Hopefully, this illustrates that while it is important to be seen as a team player, it is not always in your best interest to say yes to every project. And it is as important for a project manager or scrum master to recognize just how much work each member of their team is carrying out. If you find yourself being put in a situation where a member of the business just so happens to pass by your desk and asks you, can you quickly take on a piece of work for me? Be sure to refer them to your condom board, to show them just how much work you're currently occurring at night and to redirect them to your project or scrum master. For these types of scenarios, I would typically request with my colleague to submit the request to the Project Q and inform them that requests are reviewed once a week, typically at the beginning of a week to identify what new work the team can take on. Also to point out to your colleague that even though they're submitting a request, this does not mean that the request will be carried out straight away and every project request will be prioritize accordingly only when a project has moved from the backlog into the first stage of development. Typically, requirements gunnery than a member of the development team will be able to give an approximate estimate of when the work will be completed. If your colleague is still adamant that the piece of work needs to be done as soon as possible, then you should request with your colleague that they marked the request for escalation, providing they have sufficient sign off from a director or senior manager. Detailed why the work needs to be carried out majorly when an escalation has taken on board the project manager. Our scrum master will then need to inform all of their subsequent teams that their work will be put on hold until the escalation could be completed. If your business is new to the process of agile development, this could take some time for your teams to adjust to the new process. My suggestion here would be to hold some agile development training with your colleagues who will be submitting requests to ensure that they are comfortable with the process and don't set unrealistic expectations. While the project may sit in a backlog for a number of weeks or even months. It is also to the business owner to review with the project our scrum master when the work is likely to be carried out. Every project keeps being pushed out by scrum master. Then it may be the case that the development team needs to be expanded so as they can take on more work by having a greater headcount are that the project may be scrapped entirely where the scrum master has reviewed with the overall business to a dreadfully what projects are in line with the business roadmap for the year and any projects that don't meet the overall business roadmap will not be taken into consideration and subsequently removed from the development backlog to summarize everyone from the developer right up to the company. President plays a part with project management and while the overall company president may not be concerned with the nuts and bolts of her project, they will be concerned with the overall direction for a company and if the company also has shareholders, they also want to be seen to be taking on the right amount of work and not to be saying unrealistic expectations for the company. Also, as a developer, our analyst with the more projects you complete over time, you will learn to balance your workload and not to set unrealistic expectations for yourself. So don't be afraid to push back on. Your colleagues are project our scrum master when required, as you're likely doing everyone a favor by ensuring that your project deliverables are completed on time and meet your customer's requirements in the next section, I will demonstrate one of the project management tools I have been using for the last number of years called Trailer and one trailer does not fully support all aspect of agile development. It does provide a very lightweight canned on board which I find for personal and small. Project development is more than adequate, that's all for this section of project management. 22. Kanban boards: in this section, I will provide an overview of the project management tool I use on a daily basis and have been using for the last number of years. Called Trail Oh, Trailer provides a very useful interface for combine boards. Can Ban provides a series of swim lanes where users can submit projects into a backlog in an office environment. This would typically be displayed on a white board using Post it notes, a scrum master. Our team member can then add project items into the backlog and move items across the board as they progress with their development. I don't know provide you with an overview of the combine board I'm currently using while creating this online course to start our will point out some of the basic features of what trailer can provide to you and your team. However, before you go and sign up for a trailer account, I would first check with your team manager or your company I T department to see if it is acceptable to use such external turd party tools. As with some companies I've worked for, they do not allow posting of any sensitive business information as undeterred party websites are as a lot of companies have adopted agile development, you may already have tools in your company for working with Cameron boards and agile development. The first action you will probably performing trail Oh is to set up your own board, and this could be done by selecting on boards and clicking on the Create new board object from the board's menu. I can also access other boards I've created are signed up to. I can quickly switch between boards such as for this one I'm looking at for financial analysis and in here I can see various items relating to that project on the status of where the project items are at switching back over to my business and data analysis with SQL Board, a scrum master will often start their convoy on board review with the right most lane and work right to left through project items on the board. Reviewing my last swim lane, I have called this lane produce and share, and that's the great thing with conven boards. You can name your swim lanes to fix your development process and add or remove swim lanes as required. So for items in the last lane these are not ready to upload. Moving back through the various swim lanes. I've got a number of items under review video, edit video, and currently I have one item in record screen for Project management, which is the item I'm presently working on for this video. Finally, in the backlog, I've got a number of items here, which I will work on at a later stage clicking on the project management ticket. I can see I have marked this as a could have item that is, while agile and Cambon are good to know, they're not critical components to this course. All the same, I'm more than happy that I have fans to the time toe. Add this section on project management into the course at the top. I've provided a brief description of how to project manage. Obviously, depending on your requirement, you may have more or less details with regards to your description, as if this item was to be taken on by another member of your team who is new to this project. Then you may want to expand in greater detail as to what the description of the requirement is next day have created a list of items I wanted to address within this section and as I progressed through the ticket, I could mark items off the list as complete. And once I get to 100% the status bar will change from a blue line into a green line, indicating that I have completed all items. This is particularly useful when you come into your weekly team meeting and your team and scrum master is asking you for a progress update on your development. I can then quickly open up the card and cello that I'm working on and show that I have 73% of the work, complete with one or two right standing items. I can also go ahead and adding comments as I progress with my work and other members of the team can also add in their feet back to the comments section. If a team member has added a watch to the ticket than any time a comment is added, they will receive an email, our Web Preiser notification with the update, I cannot to my comment. Various items, such as an attachment, are mentioned another team member if they need to take some action on this card and also point to another card. If it is relation, I could go ahead and create another checklist such as I might want to check list here to sign off on the card for sign off. I will add in meet with the customer review video I submit to you too. May I can also set a completion date. For example, if I wanted to set a Jew date for next week Once I've completed the item of work, I can then come in and check the box that I've completed this item reviewing the ticket on the board. I can also see as a scrum master when I review such tickets. If there is a Jew date set, this will allow me as a scrum master, to prioritize items better knowing when items have a fixed deadline. If an item has been marked as complete then the item will appear as green on the combine board. And if I go past my due date, then the item will appear is red on the board. If I wanted to come along and I remember to this card, I can do so easily by clicking on the members button searching for the member on adding them to the card. Well, what makes trailers so great to use is that you can easily move cards arraigned within your Web browser by just clicking and dragging on a card into the corresponding swim lanes. As a scrum master with my team, I will typically come into a weekly meeting and first ask what items have progressed with development starting with items closest to completion. For example, if I have an item in review video such as my SQL Explain plan, I can drag this from the review video to the Complete Share Swim lane. Moving across the candy on board. I can look at items in the next swim lane, Edit video on the card, create reported. Excel is ready to move into three Review Video Swim lane. As I move a card across the board, I will move the card from the top of the swim lane. The card is presently in to the bottom of the next swim lane. This keeps the cards and a relevant order on the swim bling. Likewise, under edit video, low data into spreadsheet and O. D. B. C. Set up. I can say I have completed editing these items and project management is what I'm currently working on in the record audio section. I can also add my board to favorites by clicking on the star icon. So when I opened up cello, I can quickly access ports I'm working on. I can also set aboard as being private are as part of a team our public under the show menu . We have more options such as I can filter on cards which ELISA scrum master than to quickly review cards at a given priority level. I can change the background, but for now I will stick with this picture. I believe it's Lake Louise Inbounds National Park trailer also provides what are known as para apps. These could be very useful for integrating into other returned party applications. For example, we can integrate directly into Gyra Google Drive calendar. So these air definitely worth reviewing for streamlining your workflow. There is also a direct link to the board, which I can send it to users directly if I want them to join the board. Lastly, if you would like to try trail oh, I'll provide a link below in the comment section it's free to sign open account, and they're also building options. If you require business support with cello, for the most part, you can get away with using the free version. And if you want to try some of the extra add ons, Trailer has a nice recommend er feature where if you recommend a new person to trail Oh, you will be given one month's access to the full version with a max, I believe of 12 months. That's all for this section on using Cambon boards in trailer. 23. Prioritise Requirements using MoSCoW: for prioritizing my work items. I used the analogy Moscow, where if you remove the letter O from the city Moscow, you're left with the acronym off M S C W Standing for must have, should have, could have and would or won't have. And I assign each of these with a color code and Tello of green for must have yellow for should have orange for could have and red for would or won't have must have items are items that are critical to your project. For example, as I'm writing a course entitled Business and Data Analysis with SQL covering, SQL is obviously a critical item in this course. However, there are items in SQL that are probably not essential for performing data and business analysis, such as knowing how to implement database replication is probably out of scope for this course. So if a team members suggested such an item, I would probably mark this is a would or won't have, as it is just as important to record items you won't be working on in the event that someone asks why we are not covering that item. Then we can look back in our convoy on board for details as to why this was not included. Even though I've marked such an item as won't have, there is potential for such an item to be worked on at a later stage, possibly in a different course. Our database replication did become relevant to date and business analysis. I could add this item back in. Typically, a team will be able to distinguish must have project items from would or won't have project items moving on to should have items. I would first consider how much time we have for our project and after a review of requirements. If we have too many must have items, then some of these will be re prioritized. A moved into the should have category with regards to reporting must have items would be key performance indicators and should have items. I would mark his secondary metrics. For example, with a sales dashboard. Total units sold would be a k p I. While quarterly difference is related to units sold, this is probably not an essential metric that is required on every sales report. Unless you're producing a quarterly report. Of course, then it would be an essential item as you can see what is critical is objective, depending on who's asking the question finally, to could have items I like to think of these as non essential components, such as graphs in a report, are color coding metrics by all means, if you have time left at the end of your project than at these in. But don't start your project with known essential items and focus on the key deliverables. That is to say, there is no point in having a nice looking report. Our application. If none of the data in the report is watch your customers asked for, so to summarize prioritization first, ask yourself. How much time do we have for this project? Next, identify what are critical components to your project. That is, components that you must have in your project. And once you have identified your must have items, everything else could be recorded as should could or won't have. Of course, when prioritizing requirements, make sure to prioritize with your customer in mind and obviously don't make any assumptions about what is critical and not critical for your project. If you are unsure of a requirement, then seek further clarification from the business owner. That's all for this section on high to prioritise requirements using the Moscow method 24. Sales Project - Introduction: in this section, I will introduce you to a small data project relating to the secular database. My personal preference for learning any new skill will be to shatter with a colleague to get first hand experience with high. They were on a date or project. Who are the go to colleagues when you need help. Also, a colleague can help provide any small details related to a system our process that you may need to understand which are not formally documented. So have your notebook at the ready to record any valuable pieces of information that may help your project along the way. For this project, I will introduce you to the workflow I've being using for the last number of years, which we will start by defining given request from the director of sales for second DVDs. Tom. We will use a project management to two first record and prioritize the request and see confirmation from tongue to clarify. We have captured the request correctly. We were then work on producing the required data by writing or queries in SQL. Once we have our results, we can then produce a standard report and dashboard to present her findings back to Tom. Let's get started with a typical scenario you are likely to be faced with probably on your first week of your new job as a data business analyst. It's Friday evening, and your justify to pack up and head home when you receive an email from tongue there. Director of sales at secular DVDs. Going something like this. Hi, Jeff. Great seeing you at the summer party. I urgently need a quarterly sales report for California for an update with the big boss Dave, next Monday morning. Can you please help? All the usual metrics will do. I owe you one tongue. How? Well, there goes my weekend. Ordinarily, if I received such a request from any other colleague, the first thing I would be thinking off is to push back and ask that a formal request be submitted into the report request. Que The request would then be reviewed by myself on my team leader, where we will prioritize the request in accordance with other requests that have come in as a business analyst. I spend probably half my Monday reviewing requests and getting back to colleagues with updates on projects and to identify if there have bean any changes to project timelines. In this case, I will need to communicate to all other project owners that an urgent request has come in from the director of sales. Tom on all requests will need to be put on hold until this matter can be completed. There are a few occasions where I would typically jump to such a request. But as this is for the president of secular DVDs, Dave, I'll need to make a few exceptions and work this request, which will probably eat into some of my weekend. So don't forget to also record your time and Luke. But ordinarily I would be coming back to whoever has sent in such a request to say that all requests need to be prioritized. And if any projects need to be moved, then I will need a project. Escalation on did sign off from the team manager, our director, with details of why such request is so urgent and has been left to the last minute. In other words, cover all your bases. Communication really is key when working with multiple teams. We'll cover this in more detail in a later section when I specifically look into how to project, manage and prioritize your work accordingly. Often times as a business analyst, you'll be working with a number of teams. And if you don't have the benefit of a team project manager who can deal with all the requests and prioritize them for you, then you are probably going to be in a position where you were doing your own project management. Even if you are working in a small company, it is likely that you will have many requests from different team members. And juggling these requests and ensuring that everything is delivered to each team on time can be a daunting task at first. But I will be up to show you some of the best practices which will help you in your daily project management. Over the course of the next few sanctions, I will be working through Tom's request from the start to finish with the end goal of producing a quarterly sales report for California. And in addition, I will also show you how to take this request one step further by producing an interactive dashboard for all sales regions. That's all for this section on. I'll see you in the next section 25. Create database: in this section, I will start by setting up a sale state base, which we will use to store the results of our analysis into pre defined tables, which we will create during our analysis. Think of a database is being similar to a team folder on the network drawing. The team folder typically serves two purposes. One it identifies where team members should store their work and to it'll as an administrator to grant specific user access, such as read right under leash privileges. To create a database, you'll need administrator rights. Or, if you don't have such privileges, then you will need to contact your database administration team requesting a new database with details such as the database name. The purpose of the database on the list of users and their privileges for access to the database are if a database already exists, which is a likely case. Unless you are a new team, then you will need to request access to the database you wish to use for your project and given privileges that you require, such as read, write and delete access. Such a request will then be reviewed by the database administrator, who in most cases is likely to be an analyst on the team you are working for. The database will store the results of our analysis into customized tables, which we will create during our analysis. Typically, for most large organizations, each team, our department will have their own database. As noted earlier, the term database is synonymous with the term scheme, and you made it also hear other terms, such as Data Mart Convert, where the Term Mart is an abbreviation for market, representing a similar idea to that of a local shopping marsh. Our store within our data warehouse vert is an abbreviation for vertical example sales, vertical financial, vertical, our marketing vertical. In addition, you may find some databases with specific project names. For example, if the project is going to last at least a year, have hundreds of tables created during it's time for analysis. And the project needs to grant access to a specific set of bespoke users that have been brought together for this project from across many teams. For example, think new product development or regulation re audits that are very sensitive to a business . Then you may find that the database has been given a specific project name. For now, I will assume that we don't have any database for the sales team, and we'll go ahead in our life coding example and set up the database called Sales. First, we define our requirement in the header on Below. You can see have also detailed the steps that I will need to follow for our project. So let's go ahead and set up our sales database. I will issue the command, create age, race sales. And if you can't remember the syntax for any function in my SQL, you can open up the secondary panel and highlight the key word to show you the definition are syntax of how to execute that term. So we'll go ahead and run the command and in the top left hand corner. If I refresh the list of schemers or databases, we can. I see that the sales database has appeared. I will go ahead now and issue the command use sales so we can connect to our database, and you can now see that the database is highlighted in bold showing that we have access the database. In addition, we can also drop a database, if so required And this may be the case if we want to rename our database as we don't actually have a function called renamed Database so we can issue the command drop database to drop our sales database, and we may want to set that up with a different name. For now, we will keep with the name sales for our project. So that's it for creating the sales database. And in the next few sections, we will start performing our analysis and create the required tables to store our results of our analysis. That's all for this sanction, and I'll see you in the next section. 26. Query Workflow: in this section, I will describe the process for how I go about writing a query. In addition, I will also highlight when I'm likely to use temperate tables over permanent tables within a data warehouse. There are a number of ways that we can create a table in SQL. The first method is to create an empty table by defining in our SQL statement the table structure such as field data types and indexes. Once the table is set up, we can then use an insert statement to populate the new table with data. The second option we have for creating a table is to use the crate table as statement, which is then combined with a select statement. The as implies that the structure of our new table will be the same as the data types returned by the following select statement. You then have the option to apply indexes and keys to your new table tables could be defined as being permanent are temporary tables. The main difference between a permanent and a temporary table is that a permanent table will have data written to the database. Hard disk and data will persist after a user, our system session has closed, while a temperate table will use the day your basis memory but is the ram of the database, Our data warehouse server and any dated that has written to a temporary table will be deleted once the user system session is closed. As an example, we have been asked to write a query to calculate sales figures to run on a weekly basis. The following is a breakdown of the development process I would typically follow. I will first find the I put table for which the results of my analysis will be stored. In our given scenario, I will create a table called aggregate weekly Sales, which will store a high level summary of customer and financial details for a given week. Next, I will create a permanent table with a small sample set of customers, which I will use for testing at each stage of my analysis as I'm using permanent tables. If I could not finish the analysis in a given day, then I can come back to my analysis at a later stage and I will still be able to access the same set of sample customers that I stored in my permanent customer sample table. For this example, I will assume I only need to set up one additional intermediary table to store a detailed list of customer transactions. But in reality, for such a query, you may require a number of intermediary tables to calculate a weekly sales and figure, such as a table to hold refund details, credit card charge, banks, currency conversions and so on. Once again as amusing permanent tables. I can come back to my analysis at any stage that I choose on disconnect from the database without any fear that I will lose my work. However, if I was using temporal tables as soon as I disconnected from the database, any tables are results that I have created will be deleted from the database as they are stored in memory. And the next time I log back into the database, I would then need to recreate a rerun the steps of my analysis moving on. Once I'm satisfied with the intermediary steps, I can then finally perform a query and output the required weekly sailed summary into the table. I initially created aggregate weekly sales. I can then move on to the next stage, which is to run my queries against a full set of data, ramping up my query. Identify that on any given week our company is processing over one million customers within excess of 10 million transactions. As we initially ran our development on a small sample set of customers, our initial development time was significantly reduced, whereas if I had started our analysis without defining a small sample set of customers are development time would have been significantly greater. As for each step along the way, we would be processing millions of transactions instead of only a handful of transactions. Next, I checked the performance of the queries by reviewing the explain plans for the full set of data. I can apply some performance tuning, which is typically in the form of adding indexes, are keys to intermediary tables and also applying additional criteria where required, such as by defining dates to reduce the number of rows and also focusing in on areas such as customer segments. Our country codes where I may, for example, Onley want to pull information for based on one country such as the United Kingdom are the United States. Adding in such criteria can reduce your query time. Significantly, Once I'm happy with my overall results, I will then go ahead and scheduled to test runs of my queries and production. The first test run will use permanent tables for our intermediary steps, and at the end of the test run, I will go ahead and drop our truncate any intermediary tables used in our analysis. The next time we run the schedule job, we would then need to create our intermediary tables again. If we have dropped them, are check that they exist and are empty. The second test will use temperate tables for storing intermediary results, and almost nine times out of 10 I find that using temporary tables are much faster as results are written and read from database Memory as opposed to a database is hard disk, with the resulting output being only a few rows of data, which we need to store in our aggregate weekly sales table. In other words, the most expensive part of our query, if we use permanent tables, would be the read and write times that one million customers under corresponding 10 million transactions to the databases hard disks. In addition, the database would also need to identify where such records can be written to the database and if we have not set appropriate indexes, are partitions for the data. Retrieving the intermediary results for our query can take a very long time. In some cases, I've seen queries which have taken up to 10 hours to run in production, reduced into 10 minutes by switching over to temporary tables where the data where has has sufficient enough memory for processing our queries in memory as opposed to writing to a hard disk. However, there is the occasion where temperate tables may not be sufficient to use, and this is likely to be where your data, where has does not have sufficient enough memory, are round to write intermediary results, too. In these scenarios, we would then need to write our intermediary results to permanent tables, which will be written to the hard disk of the database, our data warehouse. For most data analysis, you will also find that you're not dealing with life production databases and are in fact reading data from a copy of life production tables. So also to note that if you are performing analysis such as a scheduled weekly job that you may be missing a few days worth of data if there is a delay in writing results from the live production system I to an intermediary development data warehouse. To summarize. Having the means to create intermediary tables means an analyst can break down the queries into smaller, compartmentalized queries instead of trying to write one very large crude, which will be very difficult to test and debug for issues. Finally, I've also provided a small reference table with some of the considerations you should take into account when choosing, if you should use permanent or temporary tables in your analysis, as noted at the beginning, I worked backwards from the results I'm trying to obtain instead of trying to work Ford's through all the criteria. So knowing the given outcome of your query at the beginning is an ideal. When performing any analysis, let's no move on and take a look at some live coding examples 27. Sales Project - Part 1 - Data Capture: For our first example, we will create a table called crust. Underscore California to hold our list of customers from California. We will create this table by defining a create statement with corresponding field names, value types and appropriate keys for our unique primary key. We will use the values of customer I d. Address I D and City I D, which will uniquely identify our row of customer address information. In addition, I will also specify three foreign keys using the values of customer I d. Address I D and City I d, respectively, in the events that we want to reference the corresponding tables Customer address on city at a later stage with our queries. As we're creating this table for analysis and it is not intended as part of our core production tables, I will not specify any constraints on these tables. But if we were to place this table into production, we could specify constraints for cascading updates related to corresponding tables such as customer address and city. For example, if the address I D for a customer was to change, we could have a cascading update from the address table to our table customer. California which would update the corresponding address i d. Such constraints are typically added into tables by database administrators, which as a business analyst, you're not likely to be defying in core system tables within a data warehouse. However, it is good to know that such constraint can be added to a table, which means a database administrator are an analyst. Doesn't have to identify all corresponding tables that need to be updated. If a change is made and can go ahead and create such constraints to perform the required updates automatically in the event that a customer's details have changed and are required to be changed in other existing tables, we can go ahead and run the statement and check that our table has been created by refreshing the schema for our sales database. We can see that the table has been successfully created and can also go ahead and inspect the table by clicking on the information icon next to the table. We can select indexes to check that are, indexes have been set up correctly, and we can also click on the D L tab specifically to see the SQL create statement for the table. Note that if you have not specified a default value to my SQL. Then my SQL will insert the default values were required, such as at the end of our create statement, we can see the default engine i n o d be specified and character set for Unicode utf h. Next, we can go ahead on populate our customer California table by using an insert select statement. Note that I have also specified the column names for our insert, which also identifies the order feels which we will select our data from If our select statement did not match the order that we have specified are we had more columns than requires. Then my SQL will throw back an error. It is not necessary to specify the number of columns are the order of columns, but I would recommend for good practice that you specify the columns and order. If you are planning to put your script into production, this will safeguard any future changes to your query and help other developers understand what you intended to insert into your table in the first place. We can go ahead now and run our insert select query to populate our table to view our results within the table. Customer. California. We can click on the table icon, which will run a select star from customer California, implying all records to be displayed. We can see that our results sent returns nine records in total. However, if our table was relatively large and we didn't want to actually bring back the table results, we could perform a number of different queries on the table to identify how many rows of being populated the first would be to perform a select can't from our table to identify the number of rows. For this, we can execute the command Select can't star from Coast California on we can see that the row can't brought back is nine Rose. In addition, there are a number of built in functions within my SQL that allow us to query the status of tables as we were using the sales database. We can go ahead and just execute the command show table status, and this will bring back details of all tables within the sales database. As we only have one table created within the sales database. We only see one rope. However, we can change which database to review and, for example, if we wanted to bring back the status of all tables within the secular database, we can run the command show table status from secular if we so required to further refine our query weaken. Additionally, ad Aware Clause, where we can like a select statement, provide a column name on day criteria against that column, for example, where name equals customer are. We can also apply fuzzy logic and bring back tables where name is, like, cussed and provide the wildcard character at the end of coast to imply any table names beginning with cussed, followed by anything. So as you can see, there are a number of ways in my SQL that we can bring back details about the table from any given database. Moving on to our second example, we will recreate the same table customer California, But this time we will use our create table as statement, which allows us to create a table based on a select query when performing quick analysis. This will be my preferred method, as they do not need to specify. Individual columns and their corresponding types as thes will be identified based on our select query. Let's first, go ahead and drop our existing customer California table, and we can see that on the left in our sales database that the table has disappeared. We can go ahead now and run our query to create our table cost California, followed by our select query for customers from California. And when we run this, the table will be generated and populated with our nine customers. As expected, we can go ahead and inspect the table again by clicking on the information icon. This time, however, if we look at the index tab, we can see that no indexes have been specified for our table on alternative. In my SQL workbench for creating an index are key for a table is to use the interface provided on the index's tab. From here, I can control, click and select the columns that we require for our primary key, such as customer I d. Address I D and City I. D. I can then click on the Crazy Index button, which will open up the Index Properties panel where I can specify additional criteria for a primary key, such as if the key is to be unique. I can also go ahead and at a comment our meta data to our index. I will keep the defaults for this sort algorithm and locking. Finally, I will shorten the name of the primary key and then go ahead and create the key. I can see that the key has been successfully created. And to refresh the d. D L tab, I will need to close the information panel and reopen ish if I want to see the new updated D D L for a customer California table, which I can I see as our unique primary key and are corresponding comment that we specified to demonstrate the manual way to add an index to our table. I will go ahead now and drop the index we just created using the drop button on the index's tab. And as I click on the button, it will also confirm by showing me the SQL Code that it is going to use to drop the index statement, I can select the drop button to go ahead and execute the command. As our customer California table already exists, we will use a slightly different syntax to add an index to an already existing table, which is to use the command altar table altar table has many functions and it is worth reviewing the my SQL documentation to identify all the uses of the altar function. For now, I will use the altar table function toe ad, appropriate keys to our customer California table. I can go ahead and execute the four statements I've prepared and then check the details of our table by selecting on the information I gone. Once again, I can see that the indexes have being created correctly on the index's tab and also on the d d l tab. I can see the corresponding create statement note that this create statement does not reflect our create table as statement that is, it doesn't show this select statement and only shows to create part of our statement. In addition to showing how we can inspect a table using my SQL workbench with the information icon, if we are not programming our SQL in my SQL workbench, then it is good to recall the various show commands in my SQL to allow us to inspect the tables. DTL create statement and indexes by highlighting the show's statement. We can see a full list of sub statements that could be paired with show such as show creation. To show a table, d d l create statement Our show index to show a tables list of indexes. Let's go ahead and run the create Table Coast, California. The resulting output provides us with two columns, the first being the table name and the second with the create table statement hovering over the create table statement will expand the statement. But in our scenario, the create statement is too large to fully display and it's clipped. We can go ahead and click on the wrap text icon to display the full create statement, and you will also need to then click on the create statement to reveal the full statement. In addition, we can also change to the form view in my SQL workbench by clicking on the form icon and the results pain. This will provide us with a more legible format for reading from from here. If required, we could also export the create statements to a note pad everyone to to set up a similar table, but with slight variations. We could copy the full create statement for the Coast California table and use this to set up a new table. For example, we might want to run a similar campaign in the Florida region, and we could use the exact same structure as the Coast California table on. Only need change the name of the table during our creation. Next, we can go ahead and run a show index for our Coast California table. I can go ahead and run this statement, and, as you can see, it's returned an error. Reviewing the help for show index, I can see that I have the option to use from are in table, and I can go ahead and correct my statement to use the show index from Coast California and run the statement again. The 1st 3 rows you can see is my unique primary key, where the unique value is set to zero on the corresponding rose reflect my foreign keys that we have set up at any time. If I'm not using my SQL workbench for SQL coding, I still have the option to display help for any SQL statement within my SQL by executing the command help followed by the SQL keyword wrapped in single quotes, for example, I can type help and look up The show command wrapped in single quotes on this will return the same output as if I executed to command in my SQL workbench. This could be very useful if we don't have direct access to my SQL workbench are we are working in the my SQL command line. If we are executing such a command in the my SQL command line, then I will want to follow my statement with a backslash capital G to wrap rows of text. I will go ahead and execute help for the show command and as before, I will also want to wrap the text in the result window to showed its full extent of the help for the show command. In addition to changing to the form view which we saw in our last example, I also have the option to right click on the value and open the results in a separate viewer window. For another example, I can also run help for a specific SQL statement such as I could run help for a create statement. But as there are a number of options for create Aiken specifically request for help create table statement where I placed the term create table between single quotes. This will now run help on the specific create table statement as opposed to bringing bank just help for create and all the sub statements that create can be paired with. Once again, I can right click on the result and opened the specific road in a separate viewer window. That's all for this section and in the next section we will continue with our sales project analysis. 28. Sales Project - Part 2 - Data Analysis: next, we will create a table to store our customer payment details and we will call this table custom pay detail. This time I will use the as statement as I will need to review a number of tables and question with regards to customer payment details. In other words, I will need to investigate a number of tables to identify what feels I want to bring back for my analysis. However, the starting point for my analysis will be to utilize the customer California table we created in our last section, which gives us a good starting point for our analysis as we have reduced the customer California table to a specific set of customers to work with. However, if I didn't have such a sample set of data and the scope of the analysis had not being defined as customers from California, I would still probably go ahead and create a small sample set of customers to review our analysis against which may very well turn out to be using the same sample set of customers . For example, a customer sent from California. It doesn't really matter what criteria you choose for your sample said, but it is important to be able to reproduce your sample set of data every time you come back to your piece of analysis to remain consistent. You may also want to consider if this analysis needs to be run at a future date and hard code your sample set to a specific date and time. For example, if our samples out of customers today contains 200 customers and we wanted to compare growth from today against a future point in time, for example, in three months time, we still want to be able to say that our sample set of customers is 200 at the initial point of analysis, so we can compare, like with Like that is, we want to look at the growth of just the 200 customers over the three month period and not include any new customers that have signed up. In the meantime, if we have saved our sample set of data into a table, then we can refer to this table three months later. But if for some reason our table was dropped as part of day, where has clean up are we accidentally dropped it ourselves, then we may want to take into a kind of criteria that uniquely identifies are 200 customers , such as account creation, date between to set dates. This would ensure that we don't take into account any new customers, but at the same time we may have Customers that are deleted are becoming active in the meantime. So you may want to also take into consideration such criteria as activity to ensure your sample set can be recreated with same set of customers out of future date moving on as we will now be working between two databases. That is the sales data base on the secular database. We will need to qualify our tables with our database names. To start with, we will bring in our first table, which will be the sales customer California table, and provide this with an alias, of course, and selected customer I d from this table. Next, we'll go ahead and join to the secular payment table and give this an alias of pay for reference, and we will join the two tables together by means of the customer i d. From the payment table, we will select the payment i d. The rental I d. The payment remained and finally the payment date. Next we will join to the rental table. But as not all payments are directly related to rentals, we will make a left join to ensure that we capture all payments. We will join on the rental I d. Between the rental and the payment table and the rental table, we can see there is an inventory I D, which we can use to join to the inventory table. I can open up the inventory table and identify which fields are available to me. I'll select the rental i d from the rental table, followed by the inventory i d from the inventory table in the inventory table, I can see there is a film I d. I can use the film I d to join to the film table and from the filling table, I can select the title of the Filling, which I may or may not use in my final analysis. But as we're trying to build up a picture of what happened in the last couple of weeks worth of payments, we need to start thinking about call lights for our final report, such as highest earning movies during the last number of weeks. In our scenario of a DVD store, we can pull back the film titles and display. For example, the top 10 films rented in the last number of weeks. If we were looking at retail products, this may be a case of pulling back a list of top 10 retail products on our website. Such nuggets of information can provide a lot of value to an audience such as upper level management are directors, as they're typically only reviewing high level figures. It could be useful to also provide some insight into these figures, such as what was driving the business at the time. Obviously, there are many other insights that we could potentially provide back to a business. In the case of one information we have for a movie, we might also want to bring back the likes of top categories of movies. For example, thriller comedy are Romans. However, you decide to admit these details as you are aware that calories can vary a lot between different movie productions, and in some cases, no category is recorded for a movie title. What we might note, though, as a side project for future analysis is to look at cleaning up categories of movies into a set of categories that suits our business better. Fernando, we will keep with movie titles, which are business will be familiar with moving on. We will know. Want to join from the inventory table to the store table to identify which payments were made at which stores From the inventory table. We can see there is a store I D on. We can use the unique index to drawing to the index of our store table. Reviewing the store table we identify. There is an address i d for our store on. From here. We can go ahead and join to the address table using the index address I d to join on both tables from the address table. We identified the address and the district, but the store is located in note that the store address is no different from a customer address on Hence, weaken store address details generically in the address table without the need to have a specific store address table. Once we are satisfied that we have all the fields of information that we require for analysis, we can go ahead and run the query to create our table customer payment detail. Note. When creating a table in SQL No to field names can have the same name as I've brought back rental I D. Twice this has thrown up in error. When creating the table to resolve this error, I've removed one of the rental id's. But if for some reason I wanted to keep the second rental, I t. I could rename the second rental I D on. This would be perfectly acceptable when creating the table. That's all for Part two of our sales project. 29. Sales Project - Part 3 - Data Summary: Finally, we will create a table to summarize our customer payment. Details on this will be called Customer payment Total Shortened to cuss pay total. Our initial brief from the director Tom was to provide a summary of customer transactions in the California district. To start, we can identify what components we want to aggregate are Summarize our data by Let's Start With the Store Dress Details, which is made up of the unique address i d. The address on the district as we know the address I d is unique to the store I d on details such as address and district or what we would refer to as supplemental Our auxiliary details. We only need group on the address i d. And for address and district, we can maximize thes fields when grouping data in SQL. The more items we have in our group by statement, the more expensive are curry will be to run identifying what items we can maximize such a store address and store district. This will reduce our query runtime. Next, we can perform some data calculations on the payment date field. I will go ahead and extract the year from payment date on call this payment year and also used a month function to extract a month from payment date on call this payment month as we also went to group our data by payment year and payment month. I will place both values into our group by statement. Note that when placing such values in the group by, we don't want to have our aliases, such as as payment year and as payment month as this will return an error if we try to run our code, there are a number of date function supported by my SQL on. It was highly recommend once again to refer to the my SQL documentation for a full list of date functions. Fernado. We will work with just the year and month functions, but if required, we could also extract details such as quarter week on day, if so required to produce a report at these levels. Knowing the level of aggregation that we will be performing, you should be able to calculate approximately how large the tables will be in size. For example, if we had 10 stores that we wanted to provide a monthly summary, then we would have 10 times 12 months producing 120 rows of data and total for any given year. If we wanted to include product details and we have 10 products, then this would increase our number of rows returned times 10 giving us one times in 200 rose. So every time we add in a level of grouping are aggregation. You should also be attempting to calculate approximately how many rows of data this will return in your final results that in large production environments, we may find multiple tables representing different levels of granularity for the business. For example, we might have a summary table of transactions at the monthly level, a separate summary table of transactions at a weekly level and another one at a daily level . Moving on we Can I take a look at how to summarise our transaction details To start, we will. Some of the, um, ain't and call this payment total. Next, we will perform account on transaction I D and call this transaction count. And finally we will perform a can't distinct on our customer. I d and call this customer can't note that if we did not perform a distinct with our can't customer i d. This would return the same value as our transaction can't, which is also the same value as row. Can't in our customer payment detailed table where this table is defined at the lowest level of granularity. That is our customer transaction level. We can go ahead and run the select statement to view a summary of our customer transactions from the California District as grouped by the corresponding store details. From here, we can see that two stores have been identified. One store for 47 my secular drive on the other store with an address of 28 my SQL Boulevard . As noted, this is a sample or test out of transactions as created by my SQL and Oracle. So why we do not have a large number of rows of data being returned? The results will never change. That makes for a very useful set of test sample data. When reviewing new features in my SQL releases coming back to our analysis, you identify that there are no means to distinguish that the results we have obtained are for only customers in the California district. There are a number of options that we could take to bring in the details to show you that the data set is representative of California customers. Option one will be to export our results sent to a spreadsheet, and within our spreadsheet, we can add a new column and present the values as being from California while Option one gets us our desired output. But ideally, we want to update our SQL code to reflect the correct logic, as we may need to run this analysis again at a future stage. Our wish to at this query into production Option two will be to rerun our analysis and add in the appropriate customer district details at the beginning of our query. This would require that we rerun all our analysis and include the customer dress details at the beginning of our analysis to be used in our final results. Summary. However, as we were working to a tight deadline of Monday morning, this may not be so simple. If our analysis has taken a number of hours to run where we might be dealing with very large transaction tables, Option three we can go back to our last step in our query and add in the appropriate customer underdressed table joins to pull back to customer district and, as we have done the majority of heavy analysis and steps one and two rerunning Step three with some minor changes is not going to be costly to run. If we try to execute the create command with our new select statement, this will throw back an error saying that the table already exists to recreate our table. I will first issue a drops table statement on our table sales customer payment total. Now I can go ahead and run the commands to create a table with her newly modified select statement. If we had such a drop statement in production, where for some reason our table did not exist. If we try to drop a table that does not exist, this will return an error to demonstrate. I will go ahead and first drop the table and then try Rerunning are drop and create statement once again To get around this error, we can use the if exists option with the drop table statement. The if exists will first perform a check for the tables existence, and if the table exists, the drop table will be performed. But if the table does not exist. My SQL will simply skip over the drop statement and move onto the next available statement , which is our create table. As statement in our results window, we can see that the drop statement where the table did not exist is now showing with a yellow triangle icon which identifies that this statement ran but did not perform a drop table final option, which I will show you which is similar to Option one with adding a field to a spreadsheet is throughout a new column to our table and manually update the required rose to reflect our value of California. First, I will revert our table and drop the new column customer district by using an altar table command as follows. Ultra table sales Customer payment Total Drop Customer district To add a new column to a table, I will issue an altar table command as follows. Altar table sales, customer Payment Total and customer district Bar chart 20. I can refresh the sales database once again and checked out our new column. Customer district has been added to the end of our table customer payment total. I can go ahead and perform a select against our table and note that customer district is set to know. Next, I will perform an update statement as follows. Update. Sales customer payment total sent customer district equal to California. I will go ahead and run a select against the table and concede that our customer district has been updated to the value of California. Note that a table update statement can be run in conjunction with Aware Cause, which I will now demonstrate. In this scenario, I will perform an update where payment months is equal to May and payment year is equal to 2005 and sent the value of customer district to be cow, an abbreviation for California. Once again running a select statement against our table, we can see that the value has been updated to count. As with any update or delete statement, I would always encourage that you take a copy of your data into another table and perform your desired updates on a sample set of data to ensure that your update or delete statements are working correctly before applying them to a production table. As once you have updated are deleted an item in a table, there is no simple undo command in a database to roll back to a previous form of your data . If for some reason you have accidentally deleted are updated a field, our number of rows incorrectly, you should contact your database administrator to attempt to roll back on the table, which will revert a table to a period prior to your updates, provided your database has rollbacks turned on in most cases. As a business, our data analyst, you're not likely to be working with life production tables and also not likely to have permissions to update such production tables. Therefore, in our case scenario, if we incorrectly updated, one of our tables, weaken, simply drop or delete the contents of the table and re populate with a given query. As you can see, there are a number of ways to update our analysis to get a desired output. In time, As you become more provisions with running queries, you will be able to weigh up, which is the best option for you as noted with our example of both. If we're working to a tight deadline in our scenario with producing a quick report, I would have likely to have chosen to just update the Excel spreadsheet manually and add in the appropriate coach where Final query but not actually run the query, as our results are easy added to a spreadsheet for California. However, if I was running this analysis for all states in America, I would then choose to run the final statement to correct the analysis on provide appropriate act put for all states. That concludes this section on summarizing our customer payment details. And in the next section we will look at cleaning up our query to run in production before finally moving on to produce a summary report of our data and Microsoft Excel and a dashboard of our results in Microsoft beyond. 30. Sales Project - Part 4 - Send to production: Now that we've completed our analysis, we can start looking to production eyes our script to run the script in production. We obviously want to remove any superfluous statements that are not required to start with . I will make a copy of the Sales production Analysis script and call this sales project production moving to the top With our first statement, we have create database sales just like create table. We can also use the if not exists with a create database statement, which will first perform a check to see if the database exists. And if it does exist, it will move on to the next statement without halting our script with an error, I can go ahead and run this statement and see that the yellow exclamation has come back to inform us that it attempted to run the create database. But as the database already exists, it will no move onto the next statement. However, it is unlikely we will be pushing a script to production without their first being a database for a script to run against. Next, we have a use sales database statement with production scripts. If we are asking a script to use the database. We don't need to keep using the reference to the database throughout the script so we can look to delete any instances of sales database from our script. Also, if we want to rename our database of the later stage, are copied the script into another database, such as marketing, we only need update the use statement and no further changes will be required to our script . As we were currently connected to the sales database. Executing a use sales database will restart our connection. In the back end, I can go to the edit menu and select find and then select, find and replace. Also to note, you can bring up the final replace toolbox by using the shortcut control. Hey H our control F If you just want to find a word in your script when editing scripts, knowing your keyboard shortcuts will save you a lot of time. If you were editing a large script with many instances to update Man that identifying to replace an R script for all instances of where we had the sales database to noted, I can continue dying through our script with further edits for our create table. I will add in and if not exists and execute the statement. I can see the Isler exclamation denoting that the statement has run. But as the table already exists, it will move on to the next statement. As we intend to place this script into reduction, I will look to expand on our analysis from customers in California and increase this to all customer dresses. I will update our table name and call this customer address on perform Another fine to replace in our code to update all instances from customer California to customer address, I will now combine the insurgent elite statements together. As typically they are find one after another, combining the two statements like this. Also, safeguards that are insert statement will not run unless the delete statement runs first. If for some reason myself our colleague needs to update and deep broke this script at a later stage, I will just common data statement. District equals to California and for debugging Aiken, just a NCAA meant which will make debugging much faster. Similarly, if my colleague was working on this script, it will mean they can see a test piece of data and not have to run a select statement and try to find an appropriate test value for District, which will reduce the time it takes for them to test the script. For example, if 10 scripts have failed in the batch run over the weekend, we want to get the batch running as quickly as possible. So by keeping such references in production, scripts will go a long way to getting our batch back up and running. I can see the customer dress table has been created along with all appropriate indexes. I can therefore go ahead and remove all the altar statements as thes were initially used in conjunction with our create table as statement. And as we have explicitly to find our table structure, we no longer need to alter the table toe, adding new indexes. I will commentate the show statements as again if I am debugging our query at a later stage . This will help me or my colleague to quickly review the tables, in particular, if I'm working in my SQL command line and don't have access to a visual interface such as my SQL workbench next for customer payment detail, I will open the information and review the d D L for the table, which I can then use to replace our create table as statement with, I will perform some edits on this statement, but you could simply use the statement. In its current form, I will remove the unassigned keyword as well as any defaults such as default equal to zero . I will also set the comments to be on the left as this makes counting rose a little easier if no road numbers are present. Also where we have long SQL statements trying to identify the start and end of a statement . This makes it easier If all the Commons air on the left I will add in and if not exists to the create table statement as before, I will follow our crate table statement with the delete from table and insert into table statement. I will also add in the column names to safeguard the positioning of fields 16 ties in from 44 rows. So much more data now that we've expanded out from just district equal to California to all district's for customers moving on to customer payment total, I will open the information and select the tables, D d L once again, which are well used to replace the create table statement as I will perform a clean up of the statement and as knows, you before you could just leave the statement in its current state, I'll perform a quick drop by just highlighting the command and execute to run, to create, table and insert run successfully. I can remove the altar statements. I can perform a select from the customer payment total table to ensure the table is populated. In addition, I will perform a Roque ain't 2838 rows have come back. How much greater quantity compared to just running for our District of California? Finally, I can perform a full test run of our script by running and execute. For all statements, however, I will first clear the output window and also closed the results and rerun the entire script. The script takes less than a second to run, and we can see 12 steps have been executed. As the script has now run without any errors, I can look to send my script to the database administration team to have them tests a script in live production. In addition, I would also provide the database administration team with scheduled details for a script such as how often the script needs to be run as our data is broken down by year in month. The script really only needs to be run at the beginning of each month to take into account for last month data. But we may also want to run the script on a weekly basis and provide year today a month to date totals to our business. That concludes this sanction on cleaning up our SQL scripts to run in production. 31. MySQL Query Explain Plan: in this section, we will review the My SQL query explained Plan, which is a tool that can help us to identify Barry's within our query that need optimization. For example, if we're careering a very large table, the explain plan will inform us how many rows are within the table. And if needs be, we can refine our query down to a smaller set of rose from the table, such as by adding criteria based on a country code are to filter on a given set of dates opening up our production query for our sales project. We can see that in the output. When we run our query, my SQL will identify Harmonie Rose are being impacted at each stage of our query before running our actual query, which may take, for example, a couple of hours to run in production. We can step down through each part of our query and running explain statement against each part. For example, a query explain will not execute against a create table statement, and if I try running the query, explain here, which I can also run by highlighting the code and using control, all tanks are selecting, explained current statement, My SQL will return. Explain data not available for this statement. Moving Diane three or query though we can start with our select statement on our customer base that we identified and using the shortcut control two x, I can see a visual explain comes back telling us that we have from the customer address table. It performs a full table scan preceded by a career E block. For example, Within our full tables gun, we get information here such as the cost hint telling us that this is a very high, very costly for large tables. It also identifies that there were no usable indexes on our table, and in this scenario, we should potentially be refining our career every day as we were working against a small table. This is not too much of an impact against our system, but if this was a large production environment, we would want to take necessary steps to reduce the number of rows that are. Query is pulling bank. An estimated cost is also provided in our query block, telling us how much memory will be allocated for our I put query as well as the visual explain, we can also look a tabular view, which provides us with additional details such as the possible keys that we can use against our tables, the table aliases that were referencing against, such as coast, address and city. And if any of the information was filtered, for example, with our first road against customer, we can see that it is using the index on 599 rows were reviewed with a full table scan, for example, that we were initially running with our district equal to California during our test. And we might also put in address where costs active is one implying that we have active customers only running our explain again. We can now see that where criteria has been applied on both the customer and address. And if we switch to our visual explain, we're still seeing details with regards to our full tables gun. But in terms of our query cost, we've been able to reduce our query. Dane, some watch moving down through our query weaken, Look at another example again, we see a cost for our query returned in our query block and looking at the explain statement, we can see a number of our tables are using indexes and relevant information about our tables, such as potential keys that we can use and how many keys were actually used and finally moving on to our last statement. We can look at the visual, explain to identify a cost, and also the tabular explained. In this scenario, we can see that are cusp. A detail table will use a temperate table. So to finish up with the my SQL query explain, this tool can be very useful to identify areas within your query that needs some attention . The query explain plan is particularly useful for identifying what keys are relevant to a given table and what you should be joining on, as well as the number of rows that you're potentially executing against. If you do find yourself having to join to a table that doesn't have a relevant set of keys , are indexes against the table. You can always contact your database administration team to ask if they can set up appropriate indexes against a given table. If you find that your query is taking a very long time to run on, potentially could run faster if you had a relevant set of keys applied to a given table in production. Lastly, to note that the my SQL explain is estimating how long your query will take. Based on information it has gathered about tables within your data warehouse or database. Provided that this information is kept up to date than your explained statements should be very accurate. However, if your database is changing quite frequently, you may find that your explained statements can be some. What misleading. While these are not actually running queries against the database, they're providing best guess estimates for your potential query when it runs against the system. That's all on the my SQL explain. I'll see you in the next section. 32. Common Table Expressions (WITH): So far, we have looked at building after queries by using either temporary are permanent tables. These techniques required that a user has access to be able to create tables within the database. However, if you were in a situation where you do not have such access to be able to create tables, then we still have the option to use what are known as common table expressions. See, Te's coming table expressions are similar to temporary, are volatile tables and that they only persist during a user session. I will be automatically deleted if a user session is closed are interrupted. Common table expressions can be used in conjunction with select update and delete statements to define one or more common table expressions. We define these statements with in a with clause. Once we have to find our common table expressions, we can reference them multiple times throughout our query, as if we were accessing a normal table. The main benefit of using such common table expressions is that data is loaded directly into a database memory, making them very fast for read and write access, as data is not recorded directly to a database is hard disks, a drawback of using common table expressions is that we cannot define indexes on the SQL engine will be left to interpret our data when an index is not provided. The default for most databases is to use the first available value in a query as the index . Therefore, when creating such common table expressions, I would structure my select statements as if I were building out an actual table and try to optimize that select statement as much as possible. For example, if I'm selecting data and want to order the data and memory, I will combine my select statement with a group by statement where attributes that are grouped in my select statement are ordered in ascending order. Therefore, even though we can't define our own indexes, we can still make some assumptions as to how the SQL engine will handle our data. When it comes to executing the queries. In some scenarios for production eyes inquiries, we may actually want to use common table expressions instead of using permanent or temporary tables. For example, if I only need to run a small QUERIAN production that I know will not take up much resources, then I may choose to use a common table expression, thus minimizing my impact on the database, where no permanent tables will be written to the databases server. In today's modern, data driven businesses, we find more business owners starting to write their own SQL queries. And with common table expressions, a user can quickly access their data without having to fully understand how to first creating optimized tables and can build a relatively complex queries without impacting the overall back end database. Business users construe or their queries alongside their working documents are even embed them directly into spreadsheets and refresh the results anytime they open up such a document. So as you can see for real world applications for a business user, common table expressions can be a very useful tool if you need to write SQL queries but don't necessarily have the access for creating tables for our life. Quoting example, I will now show you how we could recreate your sales. Cleary to use common table expressions to start. I've opened up my sales project Query on. I will now work through this query to show you how I can change parts of this query to use common table expressions instead of using permanent or temporary tables. But before I start making any modifications, I will make a copy of our Crary. So I will guru and save script as on I've got a file here now called sales project C T for common table expressions which we can go ahead and work on on. This won't disrupt our existing query. As you may remember, at the beginning of our sales project, I showed you how you can create tables by either defining a table structure by using the create table and then providing the parameters, followed by an insert statement into that table to populate the table. The next method I showed you then waas how you can create a table by just merely providing a select statement into a create table. And this is our create table as, and this is what we're going to work on with our with statement. So to start with, I will remove our first section, and for this example, I'll assume I'm a business user who doesn't have create table access, and hence I need to be able to set up some form of table structures using common table expressions and therefore I'm also going to assume that I don't have create database access , but you can use these methods even if you have or don't have access to create tables. So in this scenario again, we will connect to our database on. We will use the sales database, even though we're not going to be writing to the sales database. In this scenario are we still want to be accessing a given database when required? I can remove the drop statement with common table expressions instead of saying Create table as we define this now as with and we don't need to provide which database we're going to be connecting to as it's going to be written into memory. But ideally, we do want to be using a database so we can define that information will be stored somewhere locally to memory. So in this scenario, if we're saying you sales, we're in the capacity of thesis ales database area, so we can go ahead and we can change this. Create table two with cost California as on. This is now the first of our common table expressions, which will weaken reference later on in our query as just because California has noted, I'm unable to create indexes, but if I did want to order the data within memory, I could provide such an expression as group by each of our attributes and by enforcing the group buying. This will order our data when it is stored into memory. Next, we have our coast pay detail. But this time, instead of defining a with claws again, I just merely add a coma to say that we're continuing on with our with statement from a both and we can put the as him go down to the bottom of our query on defying the closing bracket. So from here we had sales cause California in this scenario, as we're not writing to the sales, we can call our Chris California from our with claws are common table expression and use it within our next common table expression. So this is where commentator expressions become very powerful, that we can actually start referencing them as we are creating them. So for this you can see we've got cause pay detail as we're accessing cost California, which is our commentated expression that we started off with. I'm really very referencing that in our second common table expression cause pay detail. Finally, we have our cusp, a total table which, if you remember previously we created a table to store our final results. In this scenario, we won't be storing our results to a table. So the most week and look to do is output the results back to our screen for this week and remove the create table and we're left with just the select statement. I will also remove the altar statements where previously we were adding additional calls for the District of California and if needs be, we could I'd in similar columns in our previous statements if we wanted to specifically identify for California. But for now, just to demonstrate the use of common table expressions, I will also remove these on to the final select statement, which would have bean from our final output table. We can go ahead on and remove sales from coast pay detail, but we can still keep our secular customer and secular address, so that should be our Queary ready now to run Andi. I can check this by performing and explain on the overall query, and as you can see here, we have quite lengthy explain where it's showing us the various different tables on common table expressions that this is generating. And in each we also have an approximation of how many rows air coming back and the final output. So I will go ahead and selecting the entire statement are I can just run everything I can say excusable and we can see our summary of information has come back here. In this scenario, we had just for California. So if if I wanted, I could change this and comment California and again I could just run for everything. This may take slightly longer again. I will perform a explain on the overall query just to see how this difference. So not too much in a difference again. We're just expanding on the where clause. What we should see is more information coming back in our results. If needs be, I could renamed the Common Table expression from Costa California to customer all but for now, I will just go ahead and execute. This time I'll select for all so you can see here. We have a lot more information coming back this time, and if I wanted to, I can export my results act into a txt file, so I'll go ahead and do that. Provide timestamp. This will be a CS fee file. So to recap, using common table expressions, we can take existing Cleary's and rewrite these if needs be using our with claws. And this could be useful if we don't have the means to write tables to the database. As you can see here, we've been able to it define two statements to common table expressions one for customer California, one for the payment details. And we've got our expected results coming back, just as we had for our previous sales query. For some analysts, common table expressions are actually a lot easier to read because you are reading this as if you were reading it like a business user. You are literally just defining what you want to see. And there's no in between. Statements such as defining tables are insert statements. So for a business use or fewer presenting results back to a business, sometimes it is useful to rewrite your queries using such common table expressions so that a business can quickly come to grips with Watcher Query is trying to do so in this scenario . As I've identified, we have customers followed by their payment details, followed by a select statement with the summary of their information that concludes this section on common table expressions. 33. User Administration: in this section, I will go ahead and show how you can set up a user to be able to access the sales schemer that we have created. Moving over from schemers, we can see there is an administration tub as we're logged in as a root user. From here, we can click under users on privileges and we can see the accounts that we created when we initially set up my SQL, for example, we have the root account and we also have an account here, Jeff, which we created which has administration privileges. We will go ahead and add an account for Tom, our sales director at Secular in the event that he also has access to my SQL and would like to be up to query some of the tables directly so we can go ahead here by clicking the add account. We have our details for account. We can select the user name as tongue and we will put in standard authentication and we will give Tom a basic password, which weaken, then send on to him. We don't need to set any account limits or restrictions, and we can also skip over administrative roles. But if needs be. If you wanted to create up additional administrators within your my SQL instance, you consent db A privileges here on selecting on each of these will grant various different permissions, which you can see if I select on each of these global privileges will be default. Golgol privileges will be applied down the right. So for a db a all permissions are presented. Where is D B? A manager will have specific privileges moving over to scheme or privileges. We want to add an entry for the specific schema, which is going to be the sales scheme of on weaken select okay and under privileges. We can say that tone will have select and we could go ahead and click Apply. So in this instance, tone will be able to query from our sales schema but would not be a lie to alter any of the information, such as insert new rows are deletes Rose are updated information. If we want to give additional privileges to Tom's, such as to be up to create tables, our insert or delete, we can also go ahead and add those privileges such as Creech, I told her Drop insert update show view and to create temperate cables. We can go ahead and click apply to test Tom's account. We can come age of our instance here. Recon Select on the plus two Crazy New Connection, and I'll call this connection named Tom Type in Tom's user name. Call this Tom test on to connect his default to the sales schema while entering Tom's password, and we get confirmation that a connection is being made successfully. Click OK on. We now have the Tom Test connection, and we can see now that Tom has logged in. He has access to the sales tables. In addition, we may also want to grant it user privileges to tome for the secular database so we can close our connection again. We need to log back in his route. So we have administration rights and under users on privileges, we can click on Tom and we can say for the secular database that tone will only have select access as this is a production area. Where is sales Weaken, Say is a development area for the sales team, But as secular is taking information directly from our application, we would only grant select privileges in this environment, we can go ahead and click, apply. And this time, when we log in as Tom, we can see that secular is also available with the default schema as sales. Lastly, if Tom, for some reason no longer required access to the MY SQL database, are telling his left the company, we can go ahead on delete the account using the delete button. We could also said as well, in addition, that the password may expire, which would force Tom to change the password the next time he logs in. This could be particularly useful when setting up a kinds that, if you have granted a temporary password that you want the user to update their password when they log in. So by clicking on expire password logging in his tongue with my temporary password that I provided him with, I can I see that I'm forced to adding a new password and again this time, now that the password has changed, I'll be us to use the new password. In addition to create, update and delete users from the my SQL interface, we can also perform the same actions using the My SQL command line are through the My SQL editor. We can go ahead and show the statements similar to the inspection that we had on our schema tables. By issuing a number of commands. I could go ahead, for example, on issue the command show Create user tongue, where Tom is specified in single quotes so we could run the same command again to re create an account for Tom. If needs be, we can see that the password is encrypted. In addition, we can also look at the grunts. Four. Tom's Weaken, say show grunts. Four Tom, and we can see that there are three grand statements. The 1st 1 is to grant usage on all of the database. Next we have specifically granting Select on Secular. On last we have are various select insert and update privileges on the sales. So with each of these commands, I could run these, and this would set the ground permissions. And similarly, if I wanted to revoke a permission, I can use the revoke function to remove a privilege. For example, if I wanted to revoke just the show view on, I can say revoke and you can see there I've needed to change the to and from. And if I look again at our show grants, we can see that show view on has been removed from the statement. And lastly, if I went to remove a user, I can issue the command drop user Tom and I can specify specifically if I wanted to delete all instances of Tom from all databases I can use. I can use a special percentage character to denote. All instances are, if specifically, I want to to specify just a given database such as local host are given server are You can also specify, for example, an I P address. For now, though, I can just go ahead on, say, drop user tone on reconsidered that drop years there has been executed. Then we can see that Tom has being removed from our list of accounts that concludes this section on how to create update on delish users from my SQL 34. MySQL Set Environment Variable: to complete our installation. I will also want to add my SQL to our environment variables. So when it comes to running our scripts at a later stage, we can access my SQL from Windows Command line on the stage is page here. We can see that the base directory is located in C program files My SQL my SQL server eight . I will go ahead and open up this directory and under the BIN directory which is where executed ALS are located, we confined the my SQL e x e as well as the my SQL Damon at men and a number of other tools which we will look at later. For example, my SQL dump for exporting databases. What we want to get from here is just this directory name so we can copy this And if we open up our windows weaken type in CMD to bring up the command prompt and as our my SQL variable is not set, it will come back saying my SQL is not recognized as an internal, our external command. So from here to set the environment variables, we can come back down to start type in Hvar for variables and we get presented with this first option here, Edit the system environment, variables. Once this window is opened, we can select environment variables and under system variables, we have what is known as the pat. If we select edit, we can edit the existing path variables. So in here, these are command lines. Are programs that will run from anywhere within a command window are recognized. These pats, we want to set up a pants pointing to to my SQL server bin directory our copy and paste Andi Also, don't forget to add the directory back slash at the end so we can go ahead and click. OK, click OK on that. And now when we come back into our command, prompt need to restart a command prompt for the variables to be recognized. And I can go ahead and type my SQL minus minus version and we can no see that my SQL is now accepting from any of the command pats. So if I, for example, was to go into my documents directory, I can see that this is also recognizing my SQL telling me that access is denied. But from anywhere now in your command window from your command prompt. It will recognize my SQL, and this will be particularly useful later on in the course where we want to be up to run a number of commands from the command line using my SQL. So I can also look to issue a command such as should be route, and I can see a list of databases by issuing the command show databases to write the course . We will be working with two databases thesis ocular and the sales database. So if I wanted to briefly look at secular, I can say you secular database changed and show tables, and we can see a list of our secular tables here. Similarly, we'll be creating later on our sales database. I can connect to sales and show tables to show that we're in a different database to quit out. I can type, quit and exit from command. I can type exit that concludes this section on setting the my SQL environment variables 35. MySQL Command Line: in this section, I will demonstrate how to use the my SQL command line interface to start. I will open up our help Documentation. I'm from here. We can look at my SQL programs. My SQL client programs on the 1st 1 to come back is theme I SQL command line. So to know, with the my SQL command line, what this will allow us to do is to log in to our mice SQL database and from there we can issue command similar to what we would be issuing from the my SQL workbench, such as use database show tables and run select queries. In addition, we can also pass in a number of parameters to the my SQL command line such as your user name password. And we also have options there to execute a sequel script and output the results to a given file. Also from the my SQL help, you can see a complete list of options that you can pass in, and from each of these you can also select on and it will give you additional details about what you can pass to the my SQL command line. So let's open up our command prompt. But within our production script Query directory To get started, we can type in my SQL. And as we have sent my SQL to be a windows variable, this command will be executed ble from anywhere within our command line so we can go ahead first and log in. And for password, we can hit return and we will be prompted for our password are If you want, You can also just type in your password after minus p without any spaces from here, you can type in as you saw earlier help to give you a list of functions so we can see here . For example, we can use database to identify a list of databases. We can look at the show command and if you can't remember, function works. You can type in to help and show, and we can see a full listing here of what we can show against. So, for example, I want to show a list of databases and similarly, I may also want to look at later show create for a table. So from here, I can say show databases and we can see her full list of databases, including the secular and sales data base. If I want to change to the secular database, I can issue the command, use secular, and I can issue another command show tables. And from here I can run SQL statements as well, such as I might want to look at how many customers we have in our customer table so we can say select and call this coast count from customer. We can see, show, create and show create tables. So if I wanted to look at the full command So from here I will look at show, create table customer and I conform at that with a backslash G to display each field name on a given room and we can get the full. I put statement R D d l for our customer table moving on weaken change over to our sales database again issued the command show Tables on within costs pay total. I will look to display the columnist So do a show create table cause Hey Total and four months correctly backslash G and for example, we might want to run a query on this table to look at select some a total from because pay total where year equals 2005 have payment year and we can see our value there. We can also issue commands such as Stages Command. If we wanted to look at the overall status so we can see our connection, i d 139 are unique. Connection ideas were locked in at the moment. We can also see additional details such as Thedc TCP poor. She's on 3306 the databases in utf eight and before and were connected Fi local host with the root user. So that's a quick demonstration of theme. I ask You'll command line in the event that you don't have access to my SQL workbench. You can still go ahead and run commands from the command line. But obviously, if you have larger career ease, this may present a challenge. As as the command line doesn't really support for executing large queries, even though we could copy and paste from a note pad file out, the better option will be to actually have my SQL read our SQL file and run the file for us coming out of SQL Weaken type exit or question. So from here, if I wanted to run a command I also have the option to run it outside the my SQL shell and bring it back directly in our dust window. So I could do this by saying my SQL sales for our sales database minus you use a Rusche minus p. You know be prompted for the password on this time minus E which will allow us to execute a command within a given set of double quotes. So I can say select Kate Starr as Roque ain't from cause pay total. Go ahead and execute that and we get back. Our answer Roque ain't is 2838. If I look at our directory, we can see we have our sales project production. Don't SQL So from here I can say my SQL and as our sales project production script has a use statement in it, we don't have to specify that we want to connect specifically into the sales database. But I can save minus you Rusche minus p. And this time we will usar directional arrow To say that we're going to direct towards our command are given file sales project production dot SQL Now, If I was to run this the output of 2838 rows would come back to the screen, and we'll take a minute or two to run. But obviously we don't want our results coming back to the screen. I'll go ahead and redirect and we can redirect and simply say into our results folder, and from there I have a file called Pay Total. I'll go ahead and just delete this file so we can see it being created, and I can select on return provide password, and that's our script run so we can come back in and see that our Scriptures run with our times down 7 37 I can edit that and we can see the full contents of our table has be night pushed to 839 Next. We also have the option to use a configuration file against our script, and this could be useful if we want to password, protect and not have a user entering their details from the command line. So from here we could have our my SQL Jeff dot c f g file, and within this file we simply have client details of our user name on password that we use to connect to our database. And from this you could obviously store this file in a safe area within Windows and have it have it protected. So from here to use our configuration file, we used the option defaults file on Weaken Pass in my SQL CFT. So this time, if I go ahead and execute, just make sure our results file is deleted and if the fire wasn't deleted, it would simply overwrite. So this time when we've executed, we've not been prompted on, you can see our file has being created. And lastly, if we wanted to schedule are filed to run with the Windows Scheduler, we can specify within a bat file within Windows details. To run against our my SQL script, we can set a number of parameters such as our sequel script. Our CONFIG file our results directory on our date time which will give us a format of year months daesh from the Windows Days function onda resulting file which will commit as cause pay total with our daytime. Andi Dr Txt On the end, when running this, we will change directory into our query And from here we can issue are my sequel command followed by our default file prompt and passing our parameters. So while you don't need to necessarily know all the details of higher to batch program, this can be a very useful way for setting up your queries to run using the Windows Scheduler and to run our query. We can just run the bat and we can see our file is being recreated here in the folder. The final step to automate our script. We cannot put task Scheduler and I will go ahead and create a new folder and call this my SQL queries, and from here we can do a create basic task. We will call this because pay total Weekly. We'll just put in a comment to, say, run customer payment Total SQL script on a weekly basis. Click Next, were given the option to ask for weekly our daily or monthly we can say recur every week on a Sunday Start the program. Send an email display message so these have being deprecate ID. Obviously we can start the program we can navigate to McCreary don't but which will run this? We don't need to pass any arguments. Onda. We can click finish once again, if we want to test that the scheduler is running, we can delete our file and click on Run, and we can see we have an operation code of to in Forming a set has run successfully. We can check our output folder, and we can see that our file has been created. That concludes this section on how to use theme I SQL command line. While using the Windows scheduler can be useful for personal development on scheduling your own queries to run on a weekly basis in much larger scale production environments, database administrators will have more complex tools, which will allow them to check TheStreet. A tous is of queries and how they are running. For example, if a query has not completed, for example, large scale production schedulers would also have the ability to set dependencies on scripts such as a script will not run unless a number of other scripts have completed first in very large production environments, we can have well over time isn't jobs running on a weekend batch, so it is important that all jobs finish with a success, and if any of thes jobs fail, then we don't want the rest of the batch continuing to process as it won't have taken into account all the jobs that are required for that job. So if you have a number of jobs that are required to run in production, I would suggest that you consult with your database administration team to see how to go about setting up those jobs for production. But if you are just looking to run a quick job over the weekend, that might take a number of hours and you want to walk away from your computer. Then you can use the Windows scheduler, such as I've demonstrated here, using a batch file to run your query. So when you come back in on a Monday morning, your job should be ready for you. This could be very useful if you just want to test a production script before actually sending it up to your database administration team. That's all for this section on. I'll see you in the next section 36. Export results to Excel: there that we have our script ready for production. Weaken, Go ahead and look at how we can export our results. I to a spreadsheet before we export our results. You may recall that at the beginning we had limited our query in preferences to only are put one ties and rose performing a Roque ain't on our query. We can see that there is 2838 rows. So before we export our results as our query limit preference is currently set toe one ties and rose. If we run this query, we will only get back one ties and rose instead of our 2838 rows. So without going into edit preferences which we could do under the query menu, we have a limit rose menu on weaken. Currently, see that this is set to one ties and rose. We can take this off to don't limit. And this time, when we run our query, we can see in our output window that 2838 rows have being returned. And similarly, if we wanted to put a limit back on if we're in the middle of testing our query, we could set this back to, for example, 10 rows, and we can see that the same career E has now run with a limit of 10. So for now, I will set the limit road to don't limit so that we capture all our rose. When we go to export to export, we can click on the export button for our results. We can see we have a number of options in my SQL presented to us to export our file type, including the likes of comma separated, file time, separated and other formats that will be akin to programming such as Jason Formatted under XML. Even though there is an Excel spreadsheet, which is denoted an XML format, I will go ahead and export in Tab separated for months. I'm called this file because pay total on time stump. With today's time stump, I've set up a folder under documents called My SQL Results sales, and you can see the file has been exported. I can go ahead and open this with note pad editor in this case, I'm using note pad plus plus and you can see the first wrote has are field names. And if we scroll Dane, we can see 2839 Rose, also noting that the file is tab separated. We can go ahead then and also open up this file in Excel and within Excel. We can go to data from txt from text or CSP and select to import. I don't need to change any details on this so I can say Load our data into Excel. Microsoft Excel is reporting back that 2838 rows have been loaded. It also shows that the data sources connected to this file name and is reporting knowing columns. So that's how we can export results from my SQL into note pads on also to be opened up in Excel. I'll go ahead and save, and I've created another folder here called Sales Reports, and I will call this sale summary. I will also create a sheet for our SQL and call this that concludes this section on exporting results from my SQL into note pad on Microsoft Excel 37. Setup MySQL ODBC Connector: in this section, we will go ahead and install the my SQL O D B C connector, which stands for open database connector. Such connectors are typically provided by database providers to allow users to gain access to their databases via turned party tools such as Microsoft Excel, and also to be able to connect from various programming languages, such as plight on C plus plus and Java. To install these connectors, we can go back to our original, my SQL installer, and from here, if we went to to upgrade any of our components, we can click on next. However, for now we can click on Cancel, which will bring us back to the My SQL installer, showing us what applications are. Programs are currently installed. Note here about our architecture is X 64. While our documentation and examples have been downloaded as X 86 if you're unsure of what architecture you are working on in Windows, you can go to your Windows Start area and type in a bunch to find out details about your PC . From here, you should see device specifications and under system type. We can see that my laptop is currently X 64. Now that you have identified what architecture you're running on, you can go to the ad menu and under my SQL connectors, the first connector we can see here is the O. D. B C connector. I will select on the X 64 connector. And in addition, as I'm doing some python development at the moment, I can install the latest version for Python, which is also X 64. We can go ahead and click on next and execute to install. Once complete, we can click next. Unfinished. We can, I see are two connectors have been installed for O. D. B C and for a python on We Can Exit Act to set up your O D B C connector, click on start on type and O. D. B C, and you can see there is O. D. B. C. Data source. 32 bit. If you're running on x 86 Architecture R O D B data sources for 64 bit. So I'll click on 64 bit. And from here we have a number of different tabs. If you're installing the driver for just your own personal use, you can select user D S n. Whereas if you want to install the driver for all users on your computer, you can select system D S N For now, we will just go ahead and install under user D S N. I'm presented here with to my SQL connectors one for ANSI and one for Unicode. As you may recall, our sales database is set up with utf eight and before so we can go ahead and use the Unicode driver And I recall this driver my ask you out o d b c Under TCP I PR server, I can type in local host. I will connect with user Rusche. I'm password. We contest the connection go ahead and click on test and we can see a successful connection . We can then go ahead. If we want to define a database, soils select on sales and click OK, we can come out of our data source Administrator on click Ok, next we can go ahead and open up Microsoft Excel. And before we attempt a connection from Microsoft Excel, I would recommend that you first check what version of Microsoft Excel you are running with my laptop as it was set up with Microsoft office already installed the default being installed with Microsoft. Excel for 32 Bish To check your version of Microsoft Excel, click on Account and click on about Excel so we can see here that Microsoft Excel. It is now installed with 64 bit version. However, however, when I initially attempted a connection from my SQL using the my SQL o d B C connector, it came back with a conflict, saying that there was a conflict between the my SQL o D B C 64 bish installation on the application, which was 32 bit. To rectify this matter, I had to uninstall Microsoft Office and reinstall Microsoft Office using the 64 bit version , for example, to install Office 64. Bish, come into the installer Select on office, and you will be given the option for set up. 64 are set up 32. However, if you select on the set up, don't XY, this will default to 32 bit version, so make sure to go into office itself and select set up for 64 bit. Once I had office, 64 bit installed, the connection worked successfully, so I can go ahead and test this connection now in Excel, by clicking on data, get data from other sources and under from other sources, I can select on from O D B C so we can click on O. D. B C. And from the drop Dane weaken CR my SQL o d B c connector that we installed. We will select my SQL o D B c on click. OK, we'll be prompted to provide a user name and password. We can click Connect and we can see. Now we're presented with a full list of details for the my SQL database. From here, we can navigate to the likes of the secular database where we can see a combination of our tables and views tables denoted with a single table icon where his views are typically queries from multiple tables are we can go ahead and select on our sales on select cusp a total and we can see our data presented to us if needs be weaken. Transform the data before loading into excel, for example, with payment total, we can change to a currency or if we had date fields, we might want to select the corresponding correct date field if we were given a full date. So once you have your data set up correctly, we can go ahead and close on, load the information and Exhale will present us with details once again telling us are nine columns have loaded from our given data source. Now that we're directly connected to the table, the next time we open up, Microsoft excel on refresh our connection. If the number of rows has increased our decreased, this will perform the correct updates in excel. I can go ahead and save my file and I will call this sale summary with O. D B C that concludes this section on how to set up the my SQL o D B C connectors. 38. Create a report in Micorsoft Excel: in this section, we will look to build it a summary report for tongue, the director at Secular DVD. To start with, we can look at summarising our table customer payment total on To do this, we can scroll to the end of our table are we can use the shortcut key control and down arrow. From here, you can see just outside our table. We have a number of drop down fields which we can use to select a number of built in functions such as some average count. For now, though, I will go ahead and just polite the some for our columns payment, total transaction count and customer Kate and we can go up to our ribbon menu this time and select on some and this will apply some function across our three columns. In addition, we can also apply filters on our table by selecting on our column headers and, for example, we could until 2000 and six just to show 2005 data on as were interested in our California area. We can untech select all type in cow for California on select just California so we can see very quickly. We've been able to get to a summary of our California customer district broken down by our to store addresses from my secular drive and my SQL Boulevard. From this, we potentially could stand on a summary report just briefly detail ING for Tom. What are the main key metrics? But to go one step further, we will produce a summary report within our Excel spreadsheet so that Tom can identify from different district's watching. The volumes are coming in. To do this, I will go ahead first and remove our summary information on Also clear all filters across our table moving to the insert menu. I can select pivot table and we would like our pivot table against the full table. So this is cusp a total. That's fine on. We'll open up our pivot table in a new worksheet so we can go ahead and click OK on the left. Here we can see our pivot table has appeared on on the right. We have our pivot table fields from here weaken start to build as our information. So to start with, I will just look at the payment total and across our rose we can look at year on a month, and I'll also bring into store district within a filter as well as the customer district. And from here we can select California once again our if needs be. If we want you to compare a number of different areas, we can click on the select multiple items. So for now, I will just select on California on recon. See here our information brought back for our customers of California. In addition, I would also like to show a growth metric for month on month of how are California customers have being performing. So with the payment total, I can drag this again onto our pivot table and you can see here we now have some of payment total and some of payment total to and under some of payment total to I will go to value field settings, show value and under the show value. As we could see, there's a number of calculations that we can apply against our dated First, I will look at the difference from and from here I can see we're presented with a number of options, such as which based field would we like to apply our difference from in this scenario. I want to look at differences from the monthly data so I can select on payment month. Andi, I would like to see the difference between the current month on the previous month, so I'll go ahead and select on previous. In the top, I can change the title of some of payment total to on. I'll refer to this as payments diff for a difference and we can see coming back. The difference here between the month of June on May is a difference of $5 on from July 2. June is 350. So this is starting to look a lot better in terms of showing our growth values to expand this and I'll bring back all data. Just so we have a little bit more data to work with and in addition to showing the difference, I will also bring in payment total again. But this time I will show the percentage difference again. We can click on show value as and this time we have a percentage difference from I can go ahead and select payment months on previous and we can see we have now a percentage difference between the given months. If I want to rename the column, I can simply just click on the field, and I can change this to payment def percent and for some of payment total. I will just call this payment total Note that we can't actually give a field name the same value as that from the field. So if I called this, for example, payment underscore total the same as our field name, Microsoft Excel will report back that this field name is already in use. But there's no problem in calling our field payment space total, and that's perfectly acceptable in to Excel. Next, we could see our values are within 2000. So we might want to format these values to represent as thousands. So that went home is reading them. He doesn't have to identify which values are within the thousands. Hundreds are millions. Giving a summary and Tyson's are millions will be much quicker for the business to review, so we can go ahead to our payment total again undervalue field settings. But this time we can look at number, format and under number. We can use the Tyson's operator and coming down to custom. I can specify another ties and separator and also the decimal precision, adding in point 00 Now, if I want to just one decimal precision, I can remove the second and to specify that this value is in thousands. We can go ahead and add in a value of K on the end, so that looks much more presentable now to be able to read a smaller figure in the ties. Since for tone, although the payment difference is a useful figure, my preference would be to show just the percentage difference. So in this case, I will go ahead and remove the field payment def. Which leaves us with our percentage difference from here. We couldn't go ahead now and insert a graph. And under the insert menu, you can see that there is a recommended charts which Excel is reviewing our pivot table. I will give us an idea of what charts is it recommending? So from here we can see it's recommending a column er chart and I can go ahead and insert this into our worksheet. However, the first thing you will notice is our payment difference percentage isn't showing up, so to get our payment difference also showing on the same sheet. We want to use a combination chart. To do this, we can right click on Weaken Select Change chart type. We can go down to the combo charge and under combo charge, you'll see that our payment difference is showing as an orange line. But the key is not of the same size. For example, on our left column here, we have values in the thousands on what we would like to show is a representation using a percentage. So to change this, we can actually set a secondary axis. And if needs be, we could also change how we want to the second re axis to look if we want it to be with points. So I'll go ahead and select on the points option there as well, and I can click. OK, so that's looking much better now that we can see the value on the difference under the design menu, I also have the option for quick legs on. One of my personal preferences is to use the first lacked, which will add in a chart title on for chart title. We will call this a payment summary, and we can also hide our fields labels, and if needs be, we can hide all our buttons. So, as you can see, we have on our left side far values presented in Tysons for our payment total. Andi On our right side, we have the percentage values so we can quickly read off from the chart. What are these given values? In addition, if we wanted to show the values on our charge where you can right click once again on add in data labels, when we can see our values come up here across our table, we could do the same for our difference. This might get a little crowded, so for now I will keep the difference on the charge on. I'll remove our payment values and I can just click delete on our keyboard. So now we can see what is theglobe oath against our charge. Once we're happy with our pivot table and chart, we can go ahead on right click on our pivot table and weaken, say, hide field list. So this will free up some space for us, and I'll rename our sheet too Payment summary. In addition, I also like to add in a report details page and under our report details, this would be similar to our SQL details. We can copy, for example, some of the details from our SQL toe at these in. So when somebody comes to use this report, we can give details as to what is what this report is detail ing. We will just say, for example, here, of course, to identify growth of payment details within a customer district. And if you wanted to go one step further, you might even bring back, for example, our field details. We can transpose our list genocide with this field name on weaken. Give details of what our fields are here. So we could have details, for example, valued our report. We might add in some additional details on how to use the report, such as If we're giving details on the payment summary tab, we might be describing what are drop down. Menus are providing for us. In addition to the payment summary, I will also add in Aiken, move our copy this so I'll create a copy and rename this to payment compare summary. And for this table, I would like to be able to bring out a comparison of all our customer district side by side . So to do this, I Kenbrell up the field list and this time I can move payment year and payment months onto the horizontal access on I can bring down in the customer district into Rose. So in this scenario, our graph is looking a little bit busy, so I will remove the graph and for values I will place values above payment year on Monk. For now, I can also remove the payment total. And from here I can select on Grand Total Andi under data I have as sort value by ascending or descending, so I'll sort descending on. This shows me all customer district's in descending order by greatest payment volume. Let's say, for example, we only want to keep our top 10 customer district's from row labels. I can go down to value filters on. I can look at top 10 and this will pick the top 10 items based on the given names. We also have the option to look at percentage and some, and we can see here are grand total for our top 10 customer district's. So I might want to rename this to top 10 customer district's and if needs be, we could bring in the store district to show what is the top 10 in each of the store district's or Q. L. D vs Alberta. But for now, I'll keep the store District eight. In addition, we may also want to provide some details at the beginning of each page, just to present back to Tom what each page is showing here. So I'll go ahead and add in description, and I will say this is payment Summary with month on Month Comparison month on month growth values on this is our top 10 customer district's my friend Total to summarize there Obviously, a number of ways that weaken present our information back to the business on Wild Tom has not provided us with any details. He has asked, for example, to provide us with the usual metrics. So in this scenario I'm taking a best guess calculation that he would be just looking for payment totals and differences to show growth in a given area. If this report was to be more extensive to go out among teams than I would always Lee review with the various teams in question to identify what metrics they want to show within a given report, such as what we're looking at here for our payment summary. And while we have presented our results across a number of tabs, we may want to come up with a summary town where we're presenting all of these results in a factual table that a business can quickly scanned, going through to show, for example, monthly on weekly growth details and other KP eyes for the business. That's all for this sanction on creating a summary report in Microsoft Excel.