Crash Course on SQL and Big Data Basics | Lazy Programmer Inc | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Crash Course on SQL and Big Data Basics

teacher avatar Lazy Programmer Inc

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

21 Lessons (47m)
    • 1. Welcome

    • 2. Outline of the course

    • 3. Overview of SQL databases

    • 4. Installing SQLite on Mac, Linux, and Windows

    • 5. What’s a relational database?

    • 6. How to load the data used in this class

    • 7. Basic commands

    • 8. Querying a table

    • 9. Creating a table

    • 10. Modifying a table’s structure

    • 11. Speeding things up with indexes

    • 12. Index Example in the Console

    • 13. Insert / Update / Delete

    • 14. What is CRUD?

    • 15. Joining or Merging tables together

    • 16. Joins in the console

    • 17. Count, Distinct, Sum, Min, Max, Avg

    • 18. Group by, Sort, Limit

    • 19. Funnels, YOY revenue, and Sales by Location

    • 20. Spark SQL

    • 21. Create your own Spark cluster on Amazon EC2

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

It is becoming ever more important that companies make data-driven decisions.

With big data and data science on the rise, we have more data than we know what to do with.

One of the basic languages of data analytics is SQL, which is used for many popular databases including MySQL, Postgres, SQLite, Microsoft SQL Server, Oracle, and even big data solutions like Hive and Cassandra.

I’m going to let you in on a little secret. Most high-level marketers and product managers at big tech companies know how to manipulate data to gain important insights. No longer do you have to wait around the entire day for some software engineer to answer your questions - now you can find the answers directly, by yourself, using SQL!

In this course, Crash Course on SQL and Big Data Basics, we'll start from the basics - installing SQL onto your Mac, Linux, or Windows machine and explaining what a relational database is. Next, we'll look at basic tasks like creating tables and loading data into those tables. We will look at a wide variety of SQL commands and I will show you how to speed things up using indexes.

Once you know all the SQL commands we will start doing advanced examples - answering questions marketers and business people often have, like where are customers dropping off in our sales funnel? And which of our locations has the highest revenue?

In the last section, we'll do Advanced SQL queries on Spark, the big data framework that is the successor to MapReduce and also runs on top of Hadoop. I will teach you how to install Spark, create a cluster very quickly on Amazon EC2, and run SQL queries, allowing you to apply everything you learned up until this point in a big data environment.

Do you want to know how to optimize your sales funnel using SQL, look at the seasonal trends in your industry, and run a SQL query on Hadoop? Then join me now in my new class, SQL for marketers! Dominate data analytics, data science, and big data!

TIPS (for getting through the course):

  • Watch it at 2x.

  • Take handwritten notes. This will drastically increase your ability to retain the information.

  • Ask lots of questions on the discussion board. The more the better!

  • Write code yourself, don't just sit there and look at my code.

Meet Your Teacher

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Welcome: Hey, guys. And welcome to SQL for marketers in this lecture, I'm going to talk about why marketers need to know s you. Oh, are you tired of depending on crafty analytics software? Do you have to ask an engineer to help you whenever you have a question about the data? This is not ideal and won't help you do your job efficiently. SQL is short for structured query language, and it's a language that can be used for all kinds of databases, from the tiny database of starting your iPhone to a large big data databases that spanned multiple continents. Engineers have done a great job of creating these different types of complex data stores while still allowing you to use the same language more or less for all of them. So what does that mean for you? It means that as long as you know SQL, you can take advantage of all this software and getting insights into your data. No matter what kind of databases stored in as long as it supports SQL, you can ask questions like how many people are falling into each stage of your sales funnel . What is my year over year profit Are there any differences in the demographics between the people who are buying product accent product? Why, what is our most profitable month? What are the seasonal trends in our industry? I am an engineer, so I probably haven't thought of all of the questions you've already had for years. But I guarantee you knowing SQL will help you answer these questions on various teams have worked on in the tech world. I've noticed that marketing people and product managers have SQL skills and sometimes even quoting skills. So if you're looking to not only make your day more productive, but make yourself more marketable to employers and catch up to the other go getters in your field, then you should definitely learn SQL. 2. Outline of the course: Hey, guys, and welcome back to this class. SQL For marketers in this lecture, I'm going to give you an outline of this course. So in the next section, after this lecture, I'm going to give you an overview of the different types of databases that support SQL. I'm going to go more in depth about sq a light, since that's the database will be using for this class. It's easily accessible for anyone using Windows, Lennox or Mac. In fact, if you're on a Mac, it comes preinstalled. If you're on Windows and Linux, I'll give you instructions for how to get your own free copy in the section. After that, I'll answer the question. What is a relational database? And I'll show you an example of a simple table like the ones will be using throughout this course. I'm going to show you how to read from a table and how to create a table very basic stuff, and the next section will go more in depth and talk about how to speed up querying a table using indexes. You can imagine that once a table gets large, it'll take longer and longer to read. I'll prove to you that indexes make things go faster. In the next section, I'll talk about how to modify data in a table. The three basic waste do this are inserting, updating and deleting in the section after we'll talk about joints. This is when you have two tables that contains some data. You want to merge, I'll show you different ways. We can do this in the next section will talk about grouping, sorting and counting things and tables you can imagine. This would help you answer what is my most profitable product? What is the average number of page views per user, and questions like that in the last section will go into some advanced stuff and I'll show you how you can run big data. SQL queries with Spark, which runs on Hadoop. 3. Overview of SQL databases: Hey guys, and welcome back to this class SQL For marketers in this lecture, I'm going to go over a number of different technologies that use SQL. You've already probably heard of my SQL. It's very popular on the Web, and if you get hosting from sites like name cheap, it usually comes with PHP, and my SQL installed another similar databases. Post grass post dresses made a lot of great improvements recently. One being case on support, which allows it to have a lot of functionality similar to Mongo db my SQL in Post Press can be run on your local machine. So if you have a PHP application or a Ruby on rails application running alongside my SQL, you can do that on the same server. But you can imagine this eats up. Resource is so what engineers usually do at scale is to put these databases on their own servers. Now you can have multiple machines running the application code, talking to the same database. Now what happens when data gets really big? Well, then we can't even store it all in one machine. You may have heard of a popular big data technology called Hadoop, the Hadoop file system allows us to do two interesting things. One is that it splits up our data in the chunks so we could potentially store a one terabyte fire. It just means that we would need to have more trucks. We can split these chunks across different machines. Number two. As you know, machines can fail, and the more machines you have, the higher chance of failure. So another thing we do is replication. We make multiple copies of these chunks in different places so that if one machine fails, we still have a copy somewhere else. Hi, viz. One framework that allows you to use Hadoop as a database and supports a similar language to SQL called hive query language for hive QL. Cassandra is another big data base technology that supports a similar language called Cassandra Query Language or Seek. You well, at the end of this course will look at just plain sq on a system called Spark, which runs on top of a dupe throughout the entirety of this course. We're going to use yet another type of database called SQL Light. Ask you a light is a file based database, so it isn't as large scale as some of the others, but it's more than enough to support a wide a range of scenarios. Ask you a light is even on your iPhone. Every app can have its own SQL database. Many of the apse on your computer do the same thing. Another great thing about SQL I. Is it super easy to install and use no matter what system you're on? So if you're using Windows, UNIX or Mac, you can do all the exercises in this course. 4. Installing SQLite on Mac, Linux, and Windows: Hey, guys, And welcome back to this glass SQL. For marketers, this lecture is going to be about installing SQL A. If you have a Mac, then you're in luck. Sq. A light already comes with the Maxwell. There's nothing to do here. If you're using Windows and you have a machine powerful enough to run virtual box out, recommend grabbing a lightweight distribution of Lennox like L A bunch or exit onto in the Lenox console. You simply have to enter the command sudo Apt. Get install SQL I. If you really want to stick with Windows, then head over to ask you a like dot org's and go to the download page. You want to download the DLL zip and the tools. IP. Since we'll be using the command line interface throughout this course, you can unzip these files to see slash SQL I and add this directory to your path environment. Variable. You can find instructions for how to do that at this euro to check that you've done it correctly. You want Open up CMD Dottie XY and type in SQL Light three, which should open up sq alike command line show 5. What’s a relational database?: Hey, guys, And welcome back to this class, sq over marketers in this lecture, I'm going to talk about what a relational databases. A relational database is a collection of tables. As you know, eight people has rows and columns, the columns are referred to as fields or attributes, and each row is a new record. So, for example, I could have a table called users with the fields. Name and email. Each table or relation, as it's also referred to, usually refers to a specific entity. So users was one example. Products might be another example, and orders might be another example. The relational part of Relational database comes from the fact that different tables can be related to each other. So, for example, if your users table has an idee field with one and two for Bob and Jane and the orders table has a user, I d field with the user. Rieti one. You know that that row refers to Bob. Since Bob has the user, I d want 6. How to load the data used in this class: Hey, guys. So in this lecture, we're going to load some data into our database for the first time. In order to get the files needed for this course, you'll need to download them from get hub. If you don't already know, Get Hub is a version control system and allows me to easily keep all the files for this course up to date and allows you to easily get updates from the command line without having to re download them from a website. You want to go to get hub dot com slash lazy programmers slash sq outclass. You can either use the command line to get clone this repo, or you can just download a zip file. I've made a pre made SQL script and some data files for you Load so you can just run that with peace commands. So the first thing we're gonna do is we're gonna create a database and run this SQL script , which runs a create table script. So SQL like three. I'm gonna call my database the database dot db. You can call it anything you want and then I'm gonna run the script, create actions that SQL Okay, so that just created a table in my database. So now let's go into the Devi. That's SQL, like three and then the name of the database. Now we're going to import the data. So if you do dot mode, CSP got import, and I'm going to explain all these commands later, so don't worry too much about them right now. So miniatures, the file small actions dot C S V and I'm gonna put it into a table called user actions. You could answer. And now all the data is in the table User actions. If you were interested in knowing how the data was generated, you want to look at the python file, generate actions that pie is basically creates a C S V, which is like an excel table, but in plain text with four columns of data name, product, action and price, you can even open the file small actions dot C S V and look at it and excel if you want to 7. Basic commands: Hey, guys. So in this lecture, we're gonna look at some basic commands in the SQL Consul Commands and SQL lights. Start with a dot Well, just introduce them. Sort of asked needed in this course. But here are a few important ones to see the databases you're currently connected to. You type in dot databases so we can see the V database dot db file that I connected to earlier so dot schema will show us the table we just created. There's a table called user actions and therefore fields name product action in price. And it also shows us that data types which will look at more in depth later, even type in tables. It shows us a list of all the tables in our database. We're going to turn the headers on and you'll see why we want to do that later. So when we look at the data are when we do, a query will get a little header like this, which tells us what we're looking at. And lastly dot exit will exit the show 8. Querying a table: Hey, guys, And welcome back to this class SQL For marketers and this lecture, I'm going to show you the basic structure of a query. So this is the simplest form of curing a table we select star, which means select everything from years of actions, which is the table name. There are three things to note about. This one is the select and the from our capitalized. We don't really need to do this, but it helps some people see the formatting better. Two is that new lines and white space are all treated the same, so it doesn't matter if I use the space or a new line of five new lines. But writing it like how I did above is somewhat of a standard way to write. SQL Queries three is that all SQL statements needs and in a semi colon, just like in Java or C plus plus. So let's try this in the console. We could turn the headers on and turn the mode to call him so that the output looks a bit nicer. So now let's look at a variation of that. Let's I only want to know about the Rose where the action is at the card. The format of this quarry is select star from user actions where action equals at the cart . So let's try this in the console. All right, so now I only see the rose where action equals at the cart. So now let's say I don't care about seeing the action of Price. I just want to know the name and the product they bought To do this. I replace the star with the columns that I want to see. So it's select name and product from user actions where action equals at the cart. Now let's try this in the console case, and I only see name and product. You can also have multiple conditions to check. You can combine conditions using andere, or you can have different comparison. Operators like equals, not equals less than greater than or is or is not, the ladder to of which are used for checking. No 9. Creating a table: Hey, guys, And welcome back to this class SQL For marketers, this lecture we're going to talk about how to create a table. So earlier, we used the script that I wrote to create a table, but we never looked at what was inside it. Now we're going to discuss the syntax of how to create a table. So the general sin taxes create table the table, name a bracket, and then you list out each column along with its type in a comma separated list. And then you close the bracket and had a semi colon. So you see, you need to define both what the name of each column is as well as its type. You can look at the SQL file we use before for an example. So what are the different types in SQL? They're very similar to what you see in other programming. Languages like C Plus plus and Java text is for strings like the ones we've been using. Var car can also be used, but it's treated the same in sq a light and text is easier and shorter to use integers for integer values. Riel is for any inside your flow value and blobs for storing binary data. Date, time, date and time. Technically, these air treated as one of the above four data types, depending on what data you put in it. Don't worry. If you don't understand this fully right now, we'll see more examples throughout this course. 10. Modifying a table’s structure: Hey, guys, welcome back to this class. SQL For marketers in this lecture, we're going to talk about modifying a table structure. One powerful feature of relational databases is that we're not stuck with the same data format forever. So let's say you want to add a new column to an existing table. We would like to know, say, the time stamp our user actions. Now we can't retroactively populate timestamp unless we have that data somewhere else. But what we can do is add a new column that could hold the time stamp in any data that would be inserted or retrieved in the future. Could make use of this new column. So let's look at the syntax of this. It's altar table, the table name ad column, column name and then the tight. So let's run this on our table, Okay? So if we select everything from user actions now, we'll see that created at column is just No. So now the next thing I'm gonna do is I'm gonna set the date time for every road that has the name Gina. Well, look more update statements later on in the course. Now, if I look at the table again. I see that Gina has a daytime, and now I can also queary using the new call. So notice how this only shows Gina, because the other roads don't have daytime's note that an alternative way to add columns would be to create a new table with the columns you want copying the data over and then the leading the old table. Since there's no way to remove a column and sq a light for summer operations, this is your only option. 11. Speeding things up with indexes: Hey, guys, welcome back to this class. SQL for marketers in this lecture, we're going to talk about speeding things up using indexes. So, as a basic example, if I gave you a list of numbers and I told you to look for a specific one, how would you find it? You would have no choice but to look at every single element until you find all the ones equal to the one that I asked you to find. You can imagine how this would be prohibited if you had a very large data set. Now, what if I gave you a sorted list of numbers and ask you to find one? So let's say I'm looking for the number nine. If I looked in the middle, I would see six, and I see that six is less than nine. So I know that I only need to keep looking in the top half. Now I look in the middle again and I see eight. I know nine is bigger than eight. So I look in the top half again. Now I get an array with just nine and 10 and now it's easy to retrieve the night. So you can see that putting the numbers in a specific format allows you to search more efficiently. Indexes are a special data structure that allow you to look things up in a table and more efficiently. There are several types of indexes that you can have in a relational table, so the first is a Vanilla Plain Index. For example, I could make the actions column easier to search, and there is the unique index. This would be like making the actions call him easier to search and forcing them to be unique. But obviously, I wouldn't want to do this for the Actions column. Since multiple users could perform the same action and the third option is a composite index, for example, this could be an index on both action and price. Simultaneously, you can see the syntax for creating these indexes below. Now there is a special type of index called a primary key. So let's say we have an orders table with the columns I D. User I. D and Total Price and I have a user's table with the columns I D. Name and email. So the idea and the users table or the I D in the orders table is usually what we call the primary keep. You could make something else the primary key, but that would be unconventional primary keys, a unique indexes that also come with another special capability. It's the ability to auto increments. That means if you have user ID's one and two in the database already, you can insert a new row by name and email only, and it will automatically have I d three. So how do I create a table with primary key? This in taxes, create table users. I call this I D, but you can call it anything you want. Typically, it's called I D. And then you say inside your primary key auto increment has to be an institute. You can see that because I have to specify both primary key and the auto increment keywords . Both are actually optional, and I could put them somewhere else. But at the same time, this is pretty much standard 12. Index Example in the Console: Hey, guys. So in this lecture, I'm going to demonstrate how indexes can speed things up. First, we're going to need a larger data set so we can see a measurable difference in speed. I didn't include this file in the repo because it would have made the repo too big. So let's exit our database. See, you gonna run python generate actions, that pie, and I'm gonna say I want a size of one million. You could make it bigger if you want. And I'm gonna call my output file Big actions that CSP cool. So now it's going to rescue a light. So the first thing I'm gonna do is I'm gonna recreate the user actions table because I don't want the created at call him that we just created. So we're gonna drop table user actions, so that deleted the table and I'm gonna create the table again. I'm gonna turn the mode into CS V and I'm going to import big actions. See, it took a little bit longer. I'm gonna turn the time around, so weaken time, all our queries. So now let's run a query searching for the name Gina, and we're also gonna turn the headers on. All right, so now it's at an index on me. Name field. Okay, now let's run our query again. Okay? So you can see it's about 10 times as fast. 13. Insert / Update / Delete: Hey, guys, and welcome back to this class. SQL For marketers in this lecture, we're going to talk about how to modify data inside a table. There are three basic operations we could do. Insert, update and delete. So let's say I want to insert a new row into the table. Let's say we're looking at our orders table again. Suppose John buys a mango so the action would be purchase. How would I get that into the database? The general syntax is insert into table name values, and then you have to specify every single calm. So this requires that every column in the table is accounted for. Now what if you only want to insert certain columns and have the rest of the columns take a default value, then that would be insert into user actions and then you specify which columns and you say values, and then you specify the actual values. So let's try that in the console, so I'm gonna insert into user actions. I only want to insert the name, the product and the action with the values John Mango and purchase. Now we can look for it. So select star from user actions were name equals John, and we can see the thing that we just added at the bottom. Notice how the prices? No, because that's the default value. So now let's look at updating. Let's say the price of John's Mango is to 99. The general syntax for updating is update. Table name set column one value, one column to value too, etcetera. Where some condition. So in our case, what we would do is update user actions. Set price equals 2 99 where name equals John and now we can check of its Been correctly updated. So now we see the 2 99 Now let's say I want to get rid of his new row. The general syntax for deletion is the lead from table name where some condition. So for us, it would be delete from user actions where name equals John. So let's try that. Okay, now, if we run our query again, we see that there are no results because we just deleted John 14. What is CRUD?: Hey guys, and welcome back to this class. Ask you out for marketers and this lecture, we're going to draw an analogy between SQL and Web development. A lot of the time. Separating the data from the application gives us a nice architectural where weaken split up the data, the you I and the application code. You can think of this as model view controller or M V C, that you eyes what you see in your browser, So that's like HTML CSS and JavaScript. The application code is a server language like PHP, ruby or python. The database could be an SQL database or in no SQL database like Mongo, DB or Red ISS. What's neat about this is that the application, which accepts and responds to http requests, has analogous verbs toe SQL, and we can generalize them as create, read, update and delete or crowd. So create is an insert and SQL, or post on http, which you may have seen if you've ever looked at an HTML form. Read is a select and SQL or get in. A CCP update is update and SQL, or put in a CTP and elite is yes, deleting all of them. This is a very important idea to keep in mind for application programmers because you have a lot of freedom when you're developing a Web application. You technically could delete Rose in the database or update them on a get request, but it's a good idea not to. 15. Joining or Merging tables together: it hasn't welcome back to this class. SQL for marketers in this lecture, we're going to talk about joins. So far, we've been working with a user actions table. It shows us the name of the user, the product they were taking the action on. And what the action waas, whether it was a view at the cart or purchase and the price. But of course, this necessitates having a user's table. Two. It might contain information like the user's location and their age information you could pull useful analytics from. So let's say we wanted to know the location where most people are buying their mangoes. That's information from two different tables. And to merge these two tables together we do what is called a joint in this lecture. I'm going to talk about the different kinds of joins you can do so the first kind of joint will talk about is an inner Join the new SQL keywords here, our inner join, where you choose the second table to join with and the on keyword where you choose how to join the two tables. Note that you can join on more than one column. We're just joining on one, which is the name. So what happened to Alison Carol? Here they go away because Alice does not show up in the user actions table, and Carol doesn't show up in the user's table. That's what we mean by an inner Join. Next. Let's look at outer joints. The only difference in syntax here is that we say full out of joint instead of inner joy. We're still joining on the name. Notice how Alison Carol have reappeared in any information they didn't have is just set to know. Note that s Q. A light does not support full outer joints, but other SQL databases do. Next. Let's look at left. Outer joins. So here Carol stays, but Alice goes away, since Alice was on the right side table. Ask you a light. Does support left out of joints. Now let's look at right. Outer joins. So this is the opposite of left. Our joins. Alice stays and Carol goes away. And so, while I ask you a light supports left outer joints, they do not support right out of joints 16. Joins in the console: So in this lecture we're going to do a simple inner joint example. So first, let's create the users table. It's gonna have name which is text age, which is insecure and location, which is text. And now it's import the user CSP. So we want to change the vote back to see SV, and we're gonna say import users dot C S V into the table users. So just to make sure it got imported properly Okay, so they're all our users. So now let's do that inner join. I'm gonna select users that name, product and location from user actions. Inter join users where user actions name is equal to users name Can we get a lot of results ? Because use their actions was the big table notice that we have to specify which name he wanted to select Otherwise, Sq a light will complain that it's ambiguous 17. Count, Distinct, Sum, Min, Max, Avg: Hey, guys, and welcome back to this class. SQL For marketers in this lecture, we're going to finally start to look at some functions that will help you do analytics. The previous lectures help you to form the foundation and the basic skill set that we can now apply to do more complex things. The first thing we'll look at is simple counting. So let's say I want to know how many people bought mangoes that would be select count from user actions where product is mango. Now let's say I want to know how many different fruits there on the database. I only want to count the number of distinct fruits. So for this, we use the distinct keyword. So now let's say I want to find the total amount people are spending on mangoes. So I only want the rose where product is equal to mango and action is equal to purchase. And then I want to some the Price column. We can also find the average men and Max Price in the table. Of course, since we set the mall to 99 cents, they're all going to return 99 cents in the next lecture. Will do more complex things with these functions as an exercise, I want you guys to try these queries on your own database. 18. Group by, Sort, Limit: Hey, guys. So in the last lecture, we were only summing one thing at a time, like the price where the product was mango. But what if we wanted to do something like find the total price for each product in the same table? And so this is where I want to use the group by statement. So there are multiple parts of this query. I'm going to select the product name and the some of the prices. The prices are per product, since I'm grouping by the product at the end, and I'm using the wear so that we're only looking at purchases. So let's try this in the console. Cool. So we see all the products and how much people have spent on that. So now let's say I want to order by the revenue in descending order, and I need to add two things. So one thing is, I need to add an order by statement to the end, but I need to give that a column name so I can order by it. So I give the column a name in the select claws and I ordered by that name in the order by clause Now that the default ordering is us ending. But I'm going to use descending. Okay, so everything is ordered now. Now, let's say I only want to look at the top three performing products that I can use the limit . Keyword. This is really useful. If your SQL query has a lot of results in, is taking a long time to print. So let's try this in the console. Okay, so now there are only three rows in the output. 19. Funnels, YOY revenue, and Sales by Location: Hey, guys, and welcome back to this class. SQL For marketers in this lecture, we're going to use the techniques that we learned in the previous lectures to answer questions that sales and marketing will commonly have about the data. The first is funnels. We want to know how many people are making it to each checkpoint so we can see with a large drop offs are analyze why that's happening and try to improve conversion rate at that point . So this is just a simple group buying count, as we've seen, but let's try to send our table. Next is year over year revenue. Typically, we want to display a chart and excel, so we would need a table with two columns, date and revenue. Since we don't have a table with any daytime will use the script I included in the Repo to create one. So just run the command python generate actions DT dot pie and then an sq a light, create this new table and then import the data. So let's do that first, okay? And now we're gonna do the import. So we have a slight problem here because this new table has date times but we don't really need times, nor do we need the day. We just want to group by month. So how do we convert date times in two months? The key is to use a function that is common in all programming languages called str F. Time dysfunction usually takes in two parameters a string that allows you to specify the output format and a daytime object. The output is a string in the format you specify created from the daytime. Object will use this to create a new column called Month and then Group by that column. And so now you see, it's in the upward format that's appropriate for an Excel chur. You can either copy and paste it manually or output to a CSP toe output to a C. S V. Make sure you set the mode to see SV and then use the dot out command to specify an output file. Then you would just run the query that we just ran. The last example will do is sort the number of sales by location. In order to do this will have to make use of joints because the sales data is in the actions table and the locations are in the user's table. So I snuck in a few new concepts here. One is notice. Count doesn't actually need any arguments, and some SQL database it does. But for SQL Light, it doesn't so this will just count the number of rows in that Group. Two is that I'm using a nested query. The outer query is selecting from a table that's not actually a table, but the result of another query. Technically, I didn't need to selects product in the inner table, but I thought that would make the output more clear. So let's try this in the console. - Okay , so now we see the expected result. 20. Spark SQL: Hey, guys. And welcome back to this class SQL For marketers in this lecture, we're going to discuss how to do sq on a big data framework called Spark. This is a little more advanced, so you'll need to know a little bit of python coating to really be comfortable. Spark also allows you to write code and scholar or java. If you happen to know these, you can start your own spark cluster using a command line script, which I'll explain in the next lecture. Or you can install it locally, and this lecture will look at a local installation. The cool thing about Spark is you can access data from anywhere your local file system, the Hadoop File System and S three, which is Amazon. Simple storage service. Most likely your data resides in one of these three places. So when you're ready to install spark, just goto this euro and download the tar file, the next step is to unzip this file and run this command. You might need to install the latest version of Java and set the Java home environment variable for this to work. The last step is just to run a simple test to make sure it's all working. One interesting thing about Spark SQL is that it doesn't read CS fees, but instead it reads. Jason files. Jason Flowers have a convenient key value format. As you can see here, you'll need to run a few Python scripts to get the data file we need for this example. So if you haven't already run Python generate actions and then the next step is to run python convert actions, which creates the Jason. So if you check out the file, spark that pie, this is just a very simple example that shows you how to load some data and display it in a table. It's pretty trivial, so we won't look at it now. What we do want to look at it sparked to dot This is a script that loads that Jason we just created and runs a simple query on it. So if we go through this line by line, you'll see some spark imports and some initialization. You need to create a spark context if you're running the script. But if you're using the spark shell, it's created for you, we create an SQL context from the spark context and use the functions that it has toe load the Jason file we created earlier. So that would be this line. Notice the use of the F that stands for a data frame. So if you know our or pandas, there are other things you can do that treat the data as a data frame similar to those tools. The next step is to give the data frame a table name because, as you know, SQL queries need a table name and then we'll run a simple query. This is just selecting the number of sales of each product, and then the last step is to print the results. Note that to run the script, we don't just run Python spark to dot pie like a regular Python script we need sparked to run this Python script. It'll be responsible for copying the script over to all the machines in the cluster and coordinating the work. The first thing I always like to do is create an alias for the sparks. Amend binary so that would be this line, and then the next thing we'll do is run, sparks admit. So this command line argument is telling spark that the Master machine is local host and we're running. It imparts 70 77. What this means is you could potentially run a script where some other machine is the master, which you want to do if you have a spark cluster. So let's run this. - Okay , so we get the expected results. You could also test this out and, sq a light to make sure you get the same answer. 21. Create your own Spark cluster on Amazon EC2: Hey, guys. And welcome back to this class SQL For marketers in this lecture, I'm going to outline how you can create your own spark cluster. So let's say no one. Your organization is using spark yet, but your data is on a duper s3. You can still use spark in this lecture. I'm going to show you a very simple way to create your own spark cluster. The first thing you want to do is create a key pair inside AWS and download the PM five. You can also use an existing key pair if you have one. The next step is to say your AWS environment variables. You can get that from the AWS console. The step after is to actually create your cluster. You can copy this command, of course, putting your real key pair. Name the path to your PM file the number of slave machines you want, the name of your cluster and the zone you want to create your machines in. You can look up on Amazon. What zones are available to run the job? It's exactly the same as before, but you just switch out local host with the I P address of your master machine, which you can get inside the eight of us. Consul. After you're done doing your data analysis, don't forget to destroy your cluster. Since eight of us can get very expensive, use the same spark easy to script from before but used the destroy commitment. You'll need to remember the cluster name you used to create your cluster because that's the same name you'll use here.