35

3

My first SQL attempt...

Hey, I'm not sure exactly what is supposed to go in here since this is my very first skillshare class, so these are my queries on each of the questions/answers to the questions that required one… hopefully that's right way to do this!

Getting started

  1. Run your first query by pasting this into the class website:
    SELECT * FROM movie_data;
  2. Display all of the available tables using a SHOW command

SHOW TABLES;

  1. Familiarize yourself with each of the three tables using the DESCRIBE command

DESCRIBE customer_data;

DESCRIBE movie_data;

DESCRIBE order_data;

  1. Modify the query in #1 to get all of the data from a different table

SELECT * FROM customer_data;


A few notes on syntax

  1. Does capitalization matter? Try making the SQL keywords lowercase

No, capitalization on the SQL keywords doesn’t matter – the query will still work.  However, it is recommended to capitalize any SQL words so if someone is reading your code, it will make it easier.  It can also be easier to go back and review your own work for errors.

  1. Does capitalization matter? Try making the table names uppercase

Yes, the table names are case sensitive. 

  1. Does whitespace matter? Try putting extra whitespace in your query

Not even a little bit.  I started putting everything on a separate line to ensure I know where each new command starts so I can keep all my SQL words straight.


Working with SELECT

  1. Instead of selecting all columns from movie_data, just get the movie_year and movie_price

SELECT movie_year, movie_price FROM movie_data;

  1. Select just the customer names from the customer_data table

SELECT customer_name FROM customer_data;

10. Why not just store all of the data in one table, anyway?

There would have to be an huge amount of rows and columns, making queries slow and making it hard to properly label/find things you are looking for.


Filtering data with WHERE

11. Select only movies that are sold for $0.99 using WHERE and the = operator:
SELECT * FROM movie_data WHERE movie_price = 0.99;

12. Comparison operators (<, <=, >, >=, !=): select only movies that are sold for more than $0.99 using the > operator  

SELECT * FROM movie_data WHERE movie_price>.99;

13. Select only movies that have the word ‘the’ anywhere in the title using LIKE and the % wildcard:
SELECT * FROM movie_data WHERE movie_title LIKE “%the%”;

14. Quotes (‘ and “) are used to enclose text. Why is this important? Try doing the above query without the quotes

If you don’t enclose the text in quotes, it doesn’t know it is text and is trying to see if it is a SQL word or a table name

15. Select only movies with the word ‘the’ at the beginning of the title

SELECT * FROM movie_data WHERE movie_title LIKE “the%”;

16. Select only movies in the second half of the alphabet

SELECT * FROM movie_data WHERE movie_title >= 'n%';


Using Boolean operators AND and OR

17. Select movies that were made after 1990 AND cost more than $0.99:
SELECT * FROM movie_data WHERE movie_year > 1990 AND movie_price > 0.99;

18. Select movies that were made after 1990 OR cost more than $0.99

SELECT * FROM movie_data WHERE movie_year > 1990 OR movie_price > .99;

19. Use parentheses to stay sane! Select movies that were (1) made after 1990 AND cost more than $0.99, OR (2) were made after 2000

SELECT * FROM movie_data WHERE (movie_year > 1990 AND movie_price > .99) OR movie_year > 2000;


Using LIMIT

 

20. Select only the first three movies in the database using LIMIT:
SELECT * FROM movie_data LIMIT 0, 3;

21. Select the next three movies in the database. How are the records organized, anyway?

SELECT * FROM movie_data LIMIT 3,3;

The records are organized by the first column, here the movie_id.  Numbering starts with 0, so if you want the first three records, you want to tell the system you want three records starting with the 0th record (0,3).  If you want the next three, you would say start with the 3rd record (because you already saw 0,1,2) and you want to see the next three (3,3)


Sorting data with ORDER BY ASCending or DESCending

22. Get all of the movie data from the database, but make sure it is ordered by year of release:
SELECT * FROM movie_data ORDER BY movie_year ASC;

23. Select the three most recently-made movies in the database using ORDER BY and LIMIT

SELECT * FROM movie_data ORDER BY movie_year DESC LIMIT 0,3;

24. Select the three least recent movies in the database using ASC

SELECT * FROM movie_data ORDER BY movie_year ASC LIMIT 0,3;


Selecting data from multiple tables

25. Select all of the data from the movie and customer data tables:
SELECT * FROM movie_data, customer_data;

26. Something went wrong. How can we tell?

We can tell because we have way too much data and the data is being repeated by multiple id’s

27. How can we fix it? Explicitly tell SQL how to join these two tables together:
SELECT * FROM movie_data, order_data WHERE movie_data.movie_id = order_data.movie_id;

QUESTION: We can't connect these two table in question #25 because they have nothing in common - no fields are the same in both, so we can connect them.

 

But, in “fixing it” in question #27, it is showing that the tables have changed - is there in fact a way to connect the two movie_data and customer_data directly?  I assume no, but am not sure.  We are now seeing all orders that were made, even if the movie_id, movie_title or customer_id occurs more than once?

28. What is the . doing in our SQL queries? It helps SQL know exactly which table to drill down on.  The “.” specifies where to look… you specify the table name then . then where you are looking.

29. LIMIT the number of records returned to the first 10 orders in our database

SELECT * FROM movie_data, order_data WHERE movie_data.movie_id = order_data.movie_id LIMIT 0,10;

30. Get only records WHERE the movie was made in 2010

SELECT * FROM movie_data, order_data WHERE movie_data.movie_id = order_data.movie_id AND movie_year = 2010;

31. Get only the 20 most recent orders, using ORDER BY

SELECT * FROM movie_data, order_data  WHERE movie_data.movie_id = order_data.movie_id  ORDER BY datetime DESC;


Scalar functions and aggregate functions

32. The syntax of functions, or, what is the average price of a movie on our database?
$1.99:    SELECT AVG(movie_price) FROM movie_data;

33. What is the MAX() price of a movie in our database?

$5.99:    SELECT MAX(movie_price) FROM movie_data;

34. What is the MIN() price of a movie in our database?

$0.99:    SELECT MIN(movie_price) FROM movie_data;

35. What is the COUNT() of rows in our order_data table?

5000:     SELECT COUNT(datetime) FROM order_data;

36. What is the average price of a movie made after 1990?

$2.56     SELECT AVG(movie_price) FROM movie_data WHERE movie_year > 1990;

37. What is the COUNT() of orders that happened in the MONTH() of May?:
396       SELECT COUNT(*) FROM order_data WHERE MONTH(datetime) = 5;



Using AS to make queries cleaner

38. AS lets you assign shorthand to a particular column or table. For example:
SELECT COUNT(*) AS num_orders FROM order_data WHERE MONTH(datetime) = 5;

This just changes the results to show your name of whatever you are counting, here we are counting “num_orders”

39. Notice that our WHERE clauses get increasingly complicated as we join more tables. How can AS help here?
SELECT * FROM movie_data AS md, order_data AS od, customer_data AS cd WHERE md.movie_id = od.movie_id AND cd.customer_id = od.customer_id;

This can reduce the errors that you get from retyping things over and over, simplifying the code and making it easier to understand.



Using GROUP BY to aggregate data intelligently, and HAVING to filter on aggregate data

40. How many times has each movie been ordered? The incorrect answer:
SELECT movie_title, COUNT(*) FROM order_data, movie_data WHERE order_data.movie_id = movie_data.movie_id;

41. What went wrong? We didn’t tell SQL exactly how we wanted to group the data. What is the right way to do that?
SELECT movie_title, COUNT(*) FROM order_data, movie_data WHERE order_data.movie_id = movie_data.movie_id GROUP BY movie_title;

42. How much money have we made from each individual movie?

SELECT movie_title, SUM(movie_price) AS revenue FROM order_data AS od, movie_data AS md WHERE od.movie_id=md.movie_id  GROUP BY movie_title;

43. How much money have we made from each individual customer?

SELECT customer_name, SUM(movie_price) AS revenue  FROM order_data AS od, movie_data AS md, customer_data AS cd WHERE od.movie_id=md.movie_id AND od.customer_id=cd.customer_id  GROUP BY customer_name;

 

44. How many movies were sold in each month?

SELECT COUNT(*) AS num_orders  FROM order_data  GROUP BY MONTH(datetime);

45. Which movies have been ordered more than X times?
SELECT movie_title, COUNT(*) AS num_orders

FROM movie_data AS md, order_data AS od

WHERE md.movie_id = od.movie_id

GROUP BY md.movie_title

HAVING num_orders > 35;

 

 

INNER JOIN and OUTER JOIN, and why they are better than WHERE

46. Your first INNER JOIN:
SELECT * FROM order_data AS od INNER JOIN movie_data AS md ON od.movie_id = md.movie_id;

47. Your first OUTER JOIN:

SELECT *  FROM customer_data AS cd  LEFT OUTER JOIN order_data AS od ON cd.customer_id = od.customer_id  LEFT OUTER JOIN movie_data AS md ON od.movie_id = md.movie_id  WHERE customer_name = 'Jessica Lee';

  1. See the difference between LEFT OUTER and INNER joins by changing all of the JOINs from the previous query into INNER joins.

Putting it all together

49. Which customers have not ordered from us in the several months?

SELECT customer_name, MAX(od.datetime) as last_order

FROM movie_data AS md, customer_data AS cd, order_data AS od

WHERE md.movie_id = od.movie_id AND cd.customer_id = od.customer_id

GROUP BY od.customer_id

HAVING last_order < '2011-10-01';

My way:

SELECT customer_name, MAX(od.datetime) as last_order

FROM customer_data AS cd, order_data AS od

WHERE cd.customer_id = od.customer_id

GROUP BY customer_name

HAVING last_order < '2012-10-01';

50. Who are the 10 customers who have ordered the most movies from us?

SELECT customer_name, COUNT(datetime) AS num_orders  FROM order_data AS od, customer_data AS cd, movie_data AS md 

WHERE od.customer_id=cd.customer_id AND od.movie_id=md.movie_id  GROUP BY customer_name  ORDER BY num_orders DESC  LIMIT 0,10;

51. Who are the 10 customers who have spent the most money with us?

SELECT customer_name, SUM(movie_price) AS revenue  FROM order_data AS od, customer_data AS cd, movie_data AS md 

WHERE od.customer_id=cd.customer_id AND od.movie_id=md.movie_id 

GROUP BY customer_name  ORDER BY revenue DESC  LIMIT 0,10;

The “challenge questions”

Queries that answer very simple and important questions, but are challenging to write.

  1. Who were our best 10 customers in the last three months of the year?

If we judge “best” by total revenue and last three months as orders after 10/30/12:

SELECT customer_name, SUM(movie_price) AS revenue

FROM order_data AS od, customer_data AS cd, movie_data AS md

WHERE od.customer_id=cd.customer_id AND od.movie_id=md.movie_id AND datetime > "2012-10-30"

GROUP BY customer_name

ORDER BY revenue DESC

LIMIT 0,10;

  1. What is the worst-selling movie on our website?

Worst-selling is defined by the number of times it sold, not revenue it brought in… and it is Patton.

SELECT movie_title, COUNT(*) AS num_sold

FROM movie_data AS md, order_data AS od

WHERE md.movie_id=od.movie_id

GROUP BY movie_title

ORDER BY num_sold ASC;

  1. Summarize total number of purchases by price; do you think price make a difference in order behavior?

SELECT movie_price, COUNT(*)

FROM movie_data AS md, order_data AS od

WHERE md.movie_id=od.movie_id

GROUP BY movie_price;

It appears that the higher price of $5.99 dissuades customers from purchasing, however there isn’t a super drastic difference between $1.99-$3.99, showing that quality of movies might be weighed more heavily than the price here.  As expected, the $.99 movies sold the most volume.

  1. Summarize sales on our website by hour of the day, i.e., $X of movies were sold between 12 and 1 AM, $Y between 1 and 2 AM, and so on.

SELECT HOUR(datetime), SUM(movie_price)

FROM movie_data AS md, order_data AS od

WHERE md.movie_id=od.movie_id

GROUP BY HOUR(datetime);

 

  1. What is the average price of a movie order across each month of the year?

SELECT MONTH(datetime), AVG(movie_price)

FROM movie_data AS md, order_data AS od

WHERE md.movie_id=od.movie_id

GROUP BY MONTH(datetime);

  1. How many times was each individual movie ordered in each month of each year? I.e., Amadeus was ordered twice in in 1/2011, three times in 3/2011, etc.

 This is the only one I haven't had a chance to answer yet!

  1. Show off! What is the most interesting insight you can draw from the database?

Comments

Please sign in or sign up to comment.