Learn Python for Data Analysis and Visualization | Tony Staunton | Skillshare

Learn Python for Data Analysis and Visualization

Tony Staunton, Reading, writing and teaching.

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
31 Lessons (2h 22m)
    • 1. Course overview

      3:36
    • 2. Skillshare 101: Getting the Most From This Course

      3:13
    • 3. Setting up Python & Anaconda

      7:02
    • 4. Setting up Atom Text Editor

      4:28
    • 5. Creating Virtual Environments

      3:28
    • 6. How to Clone a GitHub Code Repository

      5:44
    • 7. Introduction to Python Pandas

      1:05
    • 8. Introduction to DataFrames

      2:13
    • 9. Inspecting DataFrames

      17:46
    • 10. Conditional Filtering

      3:51
    • 11. Using NumPy and Pandas Together

      2:21
    • 12. Creating DataFrames with NumPy

      3:03
    • 13. Creating DataFrames from Python Dictionaries

      6:16
    • 14. Using Broadcasting in DataFrames

      1:40
    • 15. Labelling Columns in DataFrams

      1:29
    • 16. Creating DataFrames with Broadcasting

      1:55
    • 17. Data Cleansing Techniques

      12:19
    • 18. Creating our first Plots

      10:29
    • 19. Creating Line Plots

      4:15
    • 20. Creating Scatter Plots

      4:22
    • 21. Creating Bar Plots

      2:01
    • 22. Statistical Exploratory Data Analysis Techniques

      6:07
    • 23. Filtering Data in DataFrames

      5:38
    • 24. Introduction to Pandas Dates & Times

      0:47
    • 25. Indexing Dates & Times

      5:38
    • 26. Creating Date Time Lists

      2:08
    • 27. Resampling Techniques

      5:36
    • 28. Method Chaining

      2:36
    • 29. How to Separate & Resampe Data

      2:29
    • 30. Further Filtering Techniques

      3:05
    • 31. Multiple Line Plots on a Single Graph

      5:02
47 students are watching this class

About This Class

When it comes to data analysis and manipulation the Python Pandas library is one of the most used libraries in Python. Whether in finance, scientific fields, or data science, a familiarity with Python Pandas is a must have.

This course teaches you how to work with real-world data sets for analyzing data in Python. Not only will you learn how to manipulate and analyze data you will also learn powerful and easy to use visualization techniques for representing your data. 

By the end of this course you will know how to:

  • Use Anaconda, the worlds leading data science platform, to setup Python and manage libraries

  • Install and setup the free to use Atom Text Editor

  • Create Virtual Environments

  • Clone a GitHub Repository directly into Atom

  • Create new code branches in GitHub and Atom

  • Install the Pandas library

  • Use Pandas DataFrames for data analysis

  • Quickly and efficiently inspect large data files

  • Use conditional filtering to refine your data

  • Use NumPy and Pandas together

  • Create DataFrames without starting data files

  • Create DataFrames from dictionaries

  • Use Broadcasting to create DataFrames

  • Correctly label data within DataFrames

  • Cleanse your data files for easier analysis

  • Create graph plots from your data (line, bar, scatter, area and more)

  • Save and export your data files for sharing

  • Use statistical exploratory data analysis techniques such as min, max, mean on your data

  • Mange date and time data within large data sets

  • Create Date/Time indexes

  • Partial string indexing

  • Resampling techniques such as downsampling

  • Method chaining

This course kicks off by showing you how to get up and running using GitHub, an essential skill in your coding career. Ideally, to get the best from this course you should have some Python programming experience.

Every piece of code and dataset used in this course is available to download for free from GitHub.

Without a doubt, this course will teach you the necessary skills to apply basic data science techniques which are used the world over by experienced data scientists and those who spend their working day in spreadsheets.

Transcripts

1. Course overview: hi, and you're very welcome to this course. Overview. Off piping pundits, data manipulation and analysis. This course is broken down into four main sections. Setting up Iten, Anaconda, Adam and Get Hope. Introduction to piping pandas. Visual data analysis and managing dates and times were piping pandas. Let's jump in and explore the course in the Livermore detail. We started off setting a pipe in it and the conduct on don't worry, you get step by step instructions. We don't movinto have to create a virtual environment. The keeper coat safe, secure on always up to date are then show you have to set up a get home account and clone repository on even push your own code to get hope for sharing collaboration on bug fixing. After that, we move into the meat of the course. Pandas on data frames and the main focus of this course will be using data frames to help manipulate, analyse and graph all of our information we look at filtering our information data cleansing. So, no doubt in your career you're going to receive data files that are dirty, such as missing headers, missing information, wrong dates and times We'll look and we'll look at how to clean all of that up. Finally, in the first section will close out by looking at how to create simple plots. We Dan expanding her plot knowledge. We're looking at how to create line scatter and power plots, all of which is called visual exploratory data analysis. We'll also look at statistical exploratory data analysis, which includes mean men max average quart tiles on standard deviations the coast of the section. We take a look at some more examples of filtering on how we can really hone in on pinpoint the data we want extract from R. C S V and data files. After that, we move into managing dates and times. So really any data files you receive customer data, stock price information, purchase information are gonna have dates and times associated with them. We look at how to leverage the power of piping pounders and data frames to make the dates and times information work for you. Next we look at indexing partial string indexing on recently, What does all that mean? It means again taking dating time information. I'm twisting it and turning it to her own benefit. We look at how to extrapolate from a week to two weeks to a month to a year. Then we look at the power of method training. So how we can change methods and piping together to perform even better data analysis on the clothes at this course, we look at creating a multi graph plot off stock price information. Let's jump in and have a deeper look at some of the features of discourse. So, as I mentioned, we're going to install the Anaconda data science platform so that you can see here on the Indiana Kanda home page and I walk you through. All of this anaconda comes with many, many data science libraries on will be put in several of these to use. It also comes with a condom package on Environment Manager on. We'll be looking at how to use the power of conduct to put our code and our projects into separate containers throughout the course. All the code data sets and every piece of information that I give to you is on. Get hope. No information is would help. You'll get all data sets originals on ended on every piece of code. You also get access to read meat, which is continually updated with student feedback questions on any Boggs and errors encountered along the way. Each lecturing discourse is a company with a PdF, which outlines what we just went through on all the code unexpected output. So you were never left short off a place to turn in case you run into trouble. By the end of this course, you'll be able to create impressive plots like this from simple X Celsius REFILES, which you will have cleaned up manipulated, analyzed to reduce aircraft off this quality. Thanks for listening. I don't see you inside the course. 2. Skillshare 101: Getting the Most From This Course: Hi, everybody. In this short video, I'm just going to give you some tips and pointers on how to get the most off the course when watching it in scale share. So here we are, open on piping step by step. Now, this could be any course, because the settings throughout scale share our standard for any course. So it doesn't matter what course you are looking at. This tutorial will still be relevant. So way over here in the viewing window, on the bottom left hand side, you can see a speed icon, so if you click on that, you're able to adjust the speed Oprah down. And that's the playback speed, the voice speed. So if you find in some of the lectures that I'm talking about, slow or you just want a quick recap, you can always up to speed. Next to that, you have a 15 2nd rewind button, which is handy if you're kind of struggling with a concept or with a topic, and you want to rewind a couple of times, which I often do another courses just to make sure that I get the point and I get the lesson over here on the right hand side. You conceive you all notes now, not all of my lessons, but some lessons will have a view note option. So, for example, in lesson number Tree How to install piping You can see here view notes, and I have a note here and have to install Platon, particularly on Windows. It's liberties. You're on a Mac. So this note is relevant to Windows users, so you can see there. It is a full note on how to install piping and have to get up and running with some caveats in Windows next to the note pin, you can see the volume button with the self explanatory volume. Up and down on next to that is the full screen toggle on enough. If we scroll down a little bit, you'll see here the community button. So we click into the community on 68 Means has been 68 questions asked and answered. The questions are varied from very simple things, like what is the print statement to very complex things and very complex questions on the pipe and language and structure? But chances are any question that you might have will be in the community section, so I encourage you to have a flicked through it. Maybe do a control on F unsearched for a topic or a keyword that you're looking for now. Every month I launch a student challenge, so June Student Challenge has just finished. So we going to project and resource is you'll see here start of the June challenge. So if you're interested in furthering your scales after you've finished the pipe, of course, or any one of my other courses, you can check out the projects and resource is and see if there's a student challenge going on there now to keep up to date under projects and Alan Challenges or any updates on might release the course. I encourage you to click Follow on, follow me so you can see any updates on new items that I might issue. One of the most common questions that I get in the community section is the question asking me is too cold throughout the course available to download while it is. And if you go to get Hub and Sarah for T Staunton, which is my user name, you'll be able to see all the courses and all the code that I have running. So here are several courses you can even see their to student challenge. You'll be able to find the code in there for the student challenge. But for this course piping beginner's guide, There you go. Section two, Section 345 All the code is in here. You're able to browse and download if you want to make your own modifications. So I encourage you as you're setting up Iten on as you're getting started in the course to visit my get home page and download the project files now. One final thing. If you are taking one of my courses on, you are enjoying them, which I hope you are. Please. If you feel so inclined, leave me a review or some feedback on how we might improve the course or improve your experience within the course. That's it. I hope that was easy enough to follow. Thanks for listening, and I'll see you in the next class 3. Setting up Python & Anaconda: Hi, everybody. And welcome back this lectures entitled Setting applied with Anaconda and that's exactly what we're going to do. I'm going to show you how to download and conduct and getting up and running on your desktop. For many of you watching this course, you may already have had piping installed, and if you do that's fine. It's no problem. You can stick with that version if you're happy with it. But it doesn't hurt to install Anaconda alongside your variation of Peyton and I'll show you why now in a few minutes. So that's head over now to the Anaconda Homepage. So what exactly is Anaconda? Well, it's a data signs platform, but again, what is the data science platform? Let's have a look here. So over here, what is Anna conduct? So as you can see, it's a data science platform. It has over six million users. It's open source, and it seems to be the industry standard for developing, testing and training on a single machine. So it's the industry standard when you're coding with fightin in the areas of data signs, machine learning or any other kind of AI area with piping or indeed or So let's go over to the product section. So appearing to write, So anaconda piping or distribution? Okay, so this looks a little bit better, and we can actually see here on this graphic what we're getting Okay. So again, the anaconda navigator will look into that down a second. The anaconda project itself. And here is the really important stuff data science, libraries. So, as we can see here known pie and pandas pandas is what this course is all about. So very important. You can see over here machine learning on down here in the bottom, right on many more. So this is only a small snippet of the library that are available true and a condom. And finally, down here on the bottom level, we can see conduct data science package on environment manager. Okay, so now that we know what we're getting, let's go ahead and download Anaconda. So over here on the top, right hand side, click download. Now, just before we go ahead. Keep in mind that as just mentioned, if you already have piping up and running on your happy, then don't where you don't have to install Anaconda. I think it's just easier. Having anaconda because of you gives you instant access to a lot of libraries. A lot of tools related to data science. But if you're happy with your set up, that's fine. Leave it as it is. Sometimes it's better not to change. And everything to keep in mind is that when you come to install Anaconda, the version you see may have moved on from what we're looking at right now. So we're looking at Version five Point Tree, which was released on the 28th of September of this year. So if you're in the future, hello. But what you might be looking at here is you could be looking at Berries and 5.4567 even varies, and six. So don't worry too much about that, but it's just important to point it out now. At this stage, if we scroll down a little bit again, we can see high performance distribution. So as we mentioned a lot of libraries and packages that come with on the conduct package management, so we look at a very to environmental, but later a port of data science. We have what operating system where you can install it on the counter on I'm back, but you can select your system. We just jump back to Mac for a moment. So piping 3.7 and as you can see here just further down, you can select the previous very self fightin 3.6. I'm in an iconic 3.6 on My lessons in this class will be 3.6 for 3.7. Shouldn't be any problem if that's what you want to go ahead and download and install. Okay, so with that, now you can click download. I'm not going to do that right now because I have Anaconda already installed. But if you don't go ahead and click download. So while Anaconda is downloading, it's worth noting here to help install Anaconda section. So that's just jump in here. It's pretty detailed. So as you can see, I'm on a Mac and it's giving me exactly what I need to know to install Anaconda on my system. If I jump back to the previous page, I just want to switch over to Windows because there's something very important to point out here in Windows, so help installed in the conduct on scrolling down here to step eight advanced installation options. So one very important thing to note and it has to do it running piping and anaconda from your command line is that if you want to be able to run it from your command line, you need to select the option here. Add on a conduct to my pat environment. Variable. If you don't do that, you will not be able to started from the command line, which we will be doing throughout this class. So once you have downloaded and installed Anaconda, you can verify its installation by going to your command or terrible. So again, Mac Terminal and all I need to do is type in fightin and, as you can see, fighting 3.6 anaconda. So insulation successful. Have you seen that? If you can see that now, then you're downloading. Your installation of Anaconda has been successful. So as previously mentioned in the Anaconda installation image, we get a couple of extra options and let's have a look now. So on a condom navigator. So, Mac, you just so I had to navigate to it. If your windows you go to program on all finals are all programs on navigate Tooty, Anaconda Navigator. So let's let this load up now. And as you can see, it's loading up various applications that came with the Anaconda download on installation. So here we are so nice clean interface on two things. I want to point out one being Jupiter notebooks. So, as it says here, Jupiter notebooks are Web based interactive computing notebook, which you can add it and ruin human readable ducks while describing the data analysis very , very handy tool. If you're not familiar with it, I do suggest you go off and Google and have a play around with Jupiter notebooks for a quick demonstration. Our lunch it here. There we go, nice and quick so you can see my file, my file and folder structure here. But I'll do is kick new fightin, and I'll just give it a really simple print. Hello, select run and you can see my output. So as it says, it's a interactive notebook for writing in printing or l putting your piping files. I would highly suggest you take a look at Uber notebooks because they're very easy to use very easy to set up on a great idea to now if I go back to my Anaconda Navigator here, you can see on the left hand side environments. So I mentioned previously virtual environments on. We'll be setting them up to have this class. But you can see here I have a couple already chat, but on dash. So when you go and you create your new environment, you have to click on each of these and see what libraries and packages you haven't sold related that environment. So in my checkbook package here sample I have a number of libraries here open SSL, pip, Iten, SQL Light and so on. So we'll be looking at how to set up virtual environments in the future lesson. But for those of you her own familiar with it, virtual environment is simply a container created by you, which holds a version of fightin and libraries, which you have installed by using a virtual environment. You don't need to worry about future updates a pipe in order and install libraries. They're locked away and safe, so that means no update will affect will affect your insulation or your programs. If anybody's familiar with Ruby Ruby on rails and ruby gems. Constant updates can often set your program back. A need to re factor or indeed, rewrite entire sections of your programs. Okay, that's it for this lecture. Hopefully, you've downloaded installed anaconda, and you've just going through the steps that I've gone through and you have the tools that I know have. Thank you. And I'll see you in the next lecture. 4. Setting up Atom Text Editor: Hi, everybody. And welcome back this lecture, we're gonna look at setting up Atom, which is a tech center that I use for my pipe and programming. So let's jump right in now and see how we go about doing that. So here I am in my Web browser, so I'm simply going to go to Adam that I'll on here we are now. So, as you can see, it's picked up one operating system I'm using. So Mac OS 10.9. So when you go to it from whatever system you're using, be it Mac Windows, Lennox, whatever might be, it's gonna pick up your system, give you the crackdown, download options on you, simply select download. Now, I should know at this point that, as in the previous lecture, if you already have a text Adler or indeed an I. D. E set up on your perfectly happy with that how it interacts with your version of fightin on your library's. Then you can skip this lecture. It's not a problem if you as long as you're happy, it's working, and you continue on with the examples that on the phone. So if you feel like skipping it. Go right ahead or otherwise. Stay tuned and we'll have a look at my set up. So what you're download is complete. You followed the installation instructions. You should see an Adam I can't like I have down here. So let's click on that. And this is the welcome screen. So as you can see here in the center pain, we have Adam Ah, Hackable Tech Center. And that's the welcome screen. On the right hand side, we have a guide, so he conflict through here now. One of the things I really like about Adam and probably one of the main reasons that I use it is because it's developed. I get home on what that means is very, very easy integration. Would you get hope? Account if you have work on where we're looking at, how to set up a get hope account on a repository a little bit later on. But what it means is that you can pull down all the sample files that I have available for you on Get hope so you can see on the left hand side. Here is your folding navigator so you can see all the course files that I have here and they're available free to you to download at any stage. And I'm gonna show you how to do that very shortly. And all these files are available on Get hope free to download and then free for you to use customized at a tweak whatever it is you want to do. So a couple things that I should point out here now in Adam, when you install Adam, it might look slightly different from what I have here. And that's probably because your preferences are a little bit different. So let's jump into the settings in the preferences now and see how I have themselves So, Adam, Or if you're on Windows, it's file and I go preferences. Okay, Perfect. So you can see here all my settings on if you come down here along the left to install you concert for different types of packages that you want to install. So what have I got installed? Well, you can see here now in my teams, I'm using the one dark and soul arise Dark team so you can select those Oregon switch it around to whenever one you might like don't talk down at all so and let's go back Core. Perfect. Okay. I like the kind of tear. Look. So that's my team. My, You I came That's called here on my syntax team. So what? Also I have installed. Well, I have installed a pipe in order complete. So for you to do that, go down to install and type in here auto complete inside to bring up some options for you. Hyphen piping on. There we go. So, as you can see here, I'm given the option toe uninstall because it's already installed for me. The next one we're going to look at is terminal, and I highly suggest, or recommend that you install this. So this is the one that I have platform. Yo, I d Tamil. So, as you can see again on its toll, But for you, it should say install. And what this does is it allows you to run terminal directly within Adam. So once you have your platform I d Tamil installed, it may need to restart, but no harm. You go down here at the bottom left hand side, click plus, And as you can see now, my terrible has opened up in my development folder. So no, me the CD into a new directory or navigate around. You might need to go into a new director here if you have a phone, a structure within a folder structure. But as you can see here, I can just start to run fightin or go went to order folders. Very handy. The half term of running within Adam because we get into the class, we're gonna be looking at very large data files and see us three files so really handy here that we don't have to jump around between different applications and different screens, but in electors. Okay, so that's a very simple lecture on how to install, set up and get up and running with Adam Text editor. As I said at the beginning, if you're happy with your choice of I, d. E or text editor work away, you don't have to use Adam. It's just my preferred choice because, as I said later on, in the classes, we are gonna be looking at integrating with get hope on, pulling down and pushing code to get thanks for listening. And I see you in the next lecture 5. Creating Virtual Environments: Hi, everybody. And welcome back and this lecture, we're going to look at how to set up a virtual environment. So if you've been following along with previous two lectures, we live, download, install and conduct, and you will have downloaded and installed Adam. So if you haven't been following along, feel free to skip the next lecture. Otherwise, let's get started and let's see how to create a virtual environment. Here I am in my Adam text other and that you can see on the left hand side. I have a folder here called course code. I'm wouldn't that folder down here. So my terminal is what in the course code folder, As mentioned previously, a virtual environment is a container where we can install and maintain our version of piping on all libraries we use to wrote this class on in our pricing programs. Okay, let's get started. First thing we need to do is tell Kanda, which is part of the Anaconda installation, to set up a virgin environment, and we do that with one line is very simple. We say conduct create, we give a name. So hyphen, hyphen. Name what? We're gonna call this one I'm gonna call this pandas, and I'm going to call, I'm gonna say, actually, pandas E N v. So my pandas environment and I got to say fightin we can tell what version of fighting to use equals 3.6. Okay, so hit that. Okay, as we can see now itself in front environment, So there's gonna take a few minutes, so we'll be back in a moment. Now, as you can see, our environment has solved and it's gonna ask for us to install some library. So let's go ahead and say yes. It's gonna start downloading and installing so again, I'll be back in a moment. Okay? So here we are, back again. And as you can see, our virtual environments set up is complete. Don't don't Don't accident. So a cup of important things to know here the activator environment we need to use source Activate pandas. E m V. If you're on windows, it's gonna be slightly different. I think it's gonna be just simply activate on down your environment name to deactivate environment, its source deactivate. So let's give that a go now. Source. Activate pandas E n v. And how do we know that they're activation of our virtual environment has been a success. Well, over here on the left hand side, you can see in brackets, pandas, E M. V. And that means we're now in a virtual environment on anything that we do in here. Installations, libraries, packages, cold, anything like that contained within this environment on future proofed. We only use a couple of libraries in this class because we're focusing on Piper's pandas. That's the library we need to install. So pit in Seoul pandas. And again I'll be back to in a moment when this installation is finished. Bareback. Now, and as you can see we have successfully installed, are pounded our pandas library successfully installed pandas done. Okay, perfect. Now one under library need to install is the map plucked Lib library. So Pip, install Mup pluck. Okay. And once again, I'll be back in a moment when this is finished downloading and starving back again. And as you can see once more success message. There we go. Now, if you don't see if success mastered success message, we get any kind of error. Please refer to the documentation. And if that fails, have a Google around or us. Drop me a message in the comments section, and I'll be happy to help out if I can. OK, that's it for this lecture. And the next lecture we're gonna look at integrating would get help on how we can download the chorus court files directly into our Adam text editor so we can get the work. Thanks for listening. And I see you in the next lecture. 6. How to Clone a GitHub Code Repository: Hi, everybody. And welcome back in this lecture, we're gonna look at accessing classes, code files. So I have uploaded all the cold contained in this class to get hope, repository, And along with that colder, all the sea SVS and data sets and resource is that you need to run true every example contained within this class. And what's the benefit of using get hope over? Just keeping the folder on my hard drive on my local machine on, then uploading it somewhere like Dropbox and then letting you downloaded or whatever it might be get Hope has several advantages. First, as I mentioned Iran, it has very easy integration with atom, which means that when it make changes, I can simply pushed him up to get hope on instantly. I have a backup and a cold repository of my latest work. When I move onto a different machine, I condemn pull that code down, get back to work and push it up justice seamlessly. Second, if I want to share of my files my cold files like I do in this lecture, I can instruct somebody and give me your around to download all the codes they can download it for you to get hope repo function or you can download the zip file. So if you haven't been following along with previous lectures and you're not using, get hope and have no interest in it, then you can simply download all the code files in a zip folder on the next rectum and import them into your project files. They can then be used within your own. I d protect seller. So but you can see here my code files are located. I get hope that come forward slash t stoned and ford slash piping pandas. So let's go there now and take a look. Okay, so here we have my cold repository. So what? You can see I have all the files needed to run this class. I also have a read me which, if you scroll down his content to shown below and I would encourage you to check back with the read with the read me as you move through the class, because I will be updating it regularly with questions and answers that come from students . And I also have a simple test, hates team out another offended should keep my coat and get help is if I make a typo orders a mistake that I haven't yet noticed, and a student comes back with some feedback on says there's an error in one of cold file. I can quickly update the code here, send out a message to the entire student body, and then they can simply pull down the code once again. So let's have a look here in data analysis of just picking this folder at random on bare plot, which will be looking at later on. As you can see, I have code here now. If I have made a typo, for example, left out equal symbol here, I could simply go into my cold editor, make the change, push it the kiss on, allow all the students and downloaded the updated to download the updated working code once again. So it allows for quick error correction on quick dissemination for all students. So let's go back to the repository and did a great thing about get hope is that I can see everything that's going on. So here in the repository now, as I mentioned at the beginning, you can either download the zip file and you're not using. Get hope. Simply download the zip file, unzip it and extract all the files to your own i. D. E and Project folder. But for this lecture, we're going to use get hope on Adam to clone the repository. So how do we do that? That's forest Copy the U R L. So this is the repository location. Secondly, that's pop open, Adam. So, as you can see, I just have to welcome screen on nothing else. Now, a couple of things that you need to do before you're ready to clone your repository. So in Adam going to preferences, let me just close this window so we have a little more room down along the left hand side. Install typing, get hyphen, clone on giving a moment so I can see it here. So I already have it installed. But for you it will be a fresh install, so click install. If you pop open settings. Once the installation is complete, you'll see here that there's a target directory. If this is a fresh install, this will be set to temp, but you can edit this and select your own folder where you would like your code files to install. So here I'm putting them into a piping folder located in one drive. Perfect. Okay, now we have got that done. Let's close out. Let's open the command palette, which will let me run, Get commands. So on Mac, it's command shift and P on Windows. It's control shift in P. So I just simply type in get. And I looked down. I can see get clone clone. So I typed out in. So it's asking me for the year around that I would like to clone, So I want to clone mighty stones and fightin underscore Pandas don't get Europe, so I kind done in press enter. And as you can see, symbol comes up Cloning report Now The report was pretty big because if they said it has all cold files, data sets and see SV's. So I come back Now, In a moment when this is finished downloading we are back again on the download is complete . Or should I say the cloning is complete? So with the left hand side, you can see piping pandas and you can see all the folders containing all the files. Then we could just sell a moment ago on, Get up! So if I had moved my screen, I still have my repository open. So you go introduction to ponders Data analysis time. Siri's resource a sample data all cleanly, efficiently replicated within atom. Let's just have a look. Introduction to pandas Data frames a list of all the files there on the code, all ready to go for future classes. That's have to clone a repository and get help. I encourage you to use get hope because it is an excellent source for sharing files. Collaborating, getting feedback on, also backing up all your hard work. If you're interested in setting up a get help account, setting up get hope on the repository on atom so that you can make your own pushes to get hope back up your own files. Then I encourage you to check out the project associated with this class. Thank you. And I see you in the next lecture 7. Introduction to Python Pandas: Hi, everybody, and welcome back. So hopefully you've been following along and you've managed to download, install atom, get get hope up and running and configure your Adam installation. Now we're really going to get into the core of this course, which is actual programming using piping pandas. So what are pandas? What planet is a library that is built on top of numb pie? So if you have any experience with non pie, it's used for mathematical calculations on that allows pipe read in data sets on data sets can be anything such as stock price information, customer information, purchase information or any kind of tabular data tabular data being information that is stored in rows and columns. So all of this information that we're going to deal with is mainly stored and formats such a CSP. So the main use of pants is to read in data onto then manipulate rows and columns. We can also use panders to quickly grab statistical information at better data sets and Sudan plot that information in the buyer plots, graphs, line plots and so on. On. We look at have to do that towards the end of this section tank for listening, and I look forward to seeing you in the coding lectures 8. Introduction to DataFrames: Hi, everybody. And welcome back. Now, in this very short lecture, I'm going to introduce you to pandas Data frames on his pandas Haddon engine. It would certainly be the data frame. It will make up most of the work that we're going to do. Drop this class. So what exactly is a data frame? But a data frame is the main way to pandas works with tabular data files. So tablet data files are files that arranged its data in rows and columns. Let's take a quick look at some of the example files that we'll be using drought this class . The first fire that we're going to look at is a download of up sales on the Google play store. So I don't know about you, but when I think of tabular data, I immediately think of spreadsheet. That's exactly what we're looking at here. We're looking at a download off Google up sales from the play store in a CSP format on most of the fires that we deal which wrote this class are going to be in cfb format. When we look at our file, we see on the left hand side roll numbers, so roll number 12 tree and so one. Later on. In the course, I'll explain to you that these road numbers are actually called the index on. We can change that up to be anything that we want, such as the date over here. We have the column A and that's the name of the up. Now you might notice that our data set here has no headings, and that's intentional because they'd run the course. We're gonna look at how to insert our own heading names. Let's take a look at another example here. We have test this stuff data. So, as we can see here are forest Row actually has headings date open, high, low, close it just a close and volume piping pandas and data frames are incredibly effective at analyzing stock price data on. We'll look at that in more detail in future lectures. And finally, let's take a look at a very small and simple data set, and that is a data set off customer wait times. As you can see here again, we have headings, customers on time, so 10 customers waited two minutes, 20 coasters, way to tree minutes and so on, and so on and later on in the class. We look at how to plot this information so we can get a graph to see what our customer wait . Times are increasing or decreasing. That's a very brief introduction to plan this data frames. But don't worry. We're going to get to know them very well later on in this class tank for listening. And I see you in the next lecture. 9. Inspecting DataFrames: Hi, folks. And welcome back. Now this lecture, we're going to take a quick recap to see how far we've come, which is a really long way. And then we're gonna look at how to inspect data using piping pandas on. The first thing that we'll be doing is learning how to import CS three file. Well, then look at different methods that we can use with piping pandas on inspecting, reading on, basically learning more about the data file that we've just imported. So that's now recap on what we've done so far. So let's open up our pandas timeline here. Okay, so we started off by setting a piping, but Anaconda So hopefully you've old on that. And as I mentioned to have lectures, if you've already got piping set up when you're happy with your I d, you can skip a lot of these early classes. Next we looked at setting up Atom text editor. Then we build our virtual environment. So a safe container that we can use Lodin operation of fightin on our libraries and then we looked at cloning get home repository. So if you've skipped up to this part, you might have missed that all the cold for this class? It's saved on my kid home repository where you can download it as a cloned repository. Or you can simply download a zip file. Next, we were introduced a pandas so brief summary of what piping pandas are. We followed that up with an introduction to data frames. Really? The workhorse or the engine off heightens pandas. And now, as I just mentioned, we're going to look at inspecting our data using pandas data frames. So let's jump right into that now. Okay, so here we are. And as you can see, I've got an empty code editor window open. But on the left hand side, I have my repository. Would you confined on my get help? Positive link. So on the left, you can see all the piping files that we're gonna be using throughout this section, which is introduction to pandas. If you've been following along with the lectures, you would have cloned to get home repository directly into your Adam code editor. If not, then download the data files and simply import them into your editor of choice. On the right hand side, we've got an empty staging area which is where we push toward get home repository. So as you can see here, I have nothing in my own states. Changes pain. I have nothing in my stage changes pain. I do have my commit to master branch here, and I also have a history of all the comix that I've been making. Okay, so let's start on. How do we do that? Well, the first thing we need to do is say in Port Pandas as PD. So, as mentioned at the beginning of this course, this is a slightly more advanced course than it beginner's piping course hope you're familiar with what importing is how it works and what it does. So in this case, what it's doing is it's importing the Pandas library into our file. So import Pandas PD next. What we want to do is we want to read in file. So let's start with creating a very about DF equals. And if you've done any research and data frames, you'll see that a lot of lessons. A lot of lectures start off by calling your data frame simply DF. Later on. In the course, we'll see that very about changing into something more meaningful, such as sales data, customer data, stock price information and so Okay, so we have a variable. Next we say PD for pandas that read Underscore CSP brackets. Now, depending on where you have your sample data located and hopefully have download the sample data from the repository, I'm going to say forward slash forward slash sample data. So I'm moving up to folder levels. Sample data sample underscore data for a slash 02 introduction. Two pandas On what? In this folder. I have a data file prepared for you and that's called intel dot c f v. And that's going to load in some Intel stock prices for us. Now, when I always do is I print my data files just to make sure that it's red incorrectly and they have to folder location typed in correctly. So let me just add in a comment here, print DF to make sure it works, So print DF Okay, let me saved us. And as you can see, as mentioned in previous lectures, my save file has now moved into my on staged area ready to be committed up to get hope again. You've been following long. You would have installed to terminal package and Adam on down on the bottom left, click the plus button. And as you can see, I'm in my development folder on What I want to Move Into is I wanna go see D introduction on CD zero to introduction to Pandas. And then I want to go see D data frames now for anybody enough familiar with eternal as you type in the name. So when I go CD, let me just back this it all up a little bit for you. So when I say CD But in my introduction to pond this folder which we're in as you can see it there when I say CD and I start typing in the name of the folder I want to move into If I press top, it automatically fills in the rest of the name for me. And as we move through this lecture on through this course, you'll see that I used that tab function a lot. It means that I'm entering into right find there. So seedy data frames. Okay, so now I'm in my data frames folder on what I want to do. What I want to ruin this piece of code that we've just entered above fightin asked to underscore to so section to lecture to And now I can say underscore inspecting began Tab It automatically fills in the rest of the final name for me. So inspecting data dot p y here and as you can see my fine name appear inspecting dated a p y. And on the left left inside inspecting data dot p y So that's in press enter. And there we go. My data file has loaded up, so let's just scroll up. Actually, let me just pull up my terminal window a little bit, make a little easier and scroll up to the top. So what have we got? Well, we've got some column names date open, high, low, close and then on the left hand side. Here we have an index of rose starting at zero. So much like an array pandas. Data frames are index zero. We've got the date open, high, low, close, rose. So this goes all the way down to the amount of rose that are in the actual data file. So let's scroll down to the end and we can see that we've got 252 rows of data in the file is a little summary. Here at the end. 250 tree rose times, five columns. So we have five columns of data in there and its 250 tree rose, because the data frame is index at zero. So as you can see here, our last number is 252 so don't forget that. Okay? So there is a very quick way load in CS v file as a data frame. Next. What we want to do. Okay, let's have a lot less. Let's just hash this out. There we go. So we're not loading that back in on the next command. Okay, So what do you want to do? We want to see what type of data we're dealing with is intel dot C S V. An actual data frame. And how do we do that? It's a print type. Pdf close that off. Save. Let's run a command again. And here we go. As we can see, we have pandas core frame data frame. So the bit we're interested in is to bid at the end. So we are indeed working with a data frame let me just come. And that s what I want to do, is actually want to add in a comment here. And I want to say how to check data type because this file is going to be pushed up to get up on. Then those of you watching this now are gonna be pulling down this this exact file. So by the time we're finished this lecture, it should look exactly as you're gonna pull it down from. Get hope unless I add in some updates or changes or corrections has suggested by yourselves . Okay, so now that we've does to commands out away what we want to look up next, a very useful command is the data frames shape. So let's have a look, So print DF dot shape. Okay. Close your bracket. Save You always have to save their on your file. I'm sure you know that. Okay, so here we go. Shape is a very brief summary of the shape of our data frame. So as we saw previously, we have 250 trade rose on five columns, nice and short. I have to check data frame shape. Okay. No, again, I'm gonna comment this out. One thing I like to do is because your downloaded these files I'm going to use them later on is keep them as nice and clean as possible. Okay, so we've looked at the shape. Now, let's see if we can have a look at the five columns that the shape mentioned, and we do it out of print DF dot columns. A lot of this is very English readable. So let's run this. And there we go. We have a list of our column names date open, high, low, close. And later on, we'll look at health polo specific columns. So we're gonna label this view column names. I'm just gonna comment this out again to make room for our next command. We're looking at the data file with 250 tree rose in it. Some data files are gonna be smaller with only 10 or less. Some data files are gonna be thousands upon thousands, if not millions of rows of data on. What you don't want to do is when we're reading in a data file like we just did on I. I always start off by printing the data for after that that I said I can check that the read CS three command worked. So when I don't want to do is I don't want to load in my data frame of millions of rows of data just to make sure that it works because it could take time to load up so I can use the head command. So that I can say is print DF does head method, So H e a D open and close brackets. Let me saved us. Let's run it on. What does that give us? Well, as we can see down here, it gives us the 1st 5 rows of data very, very handy. A nice quick summary to make sure that everything's working So you can see 0 to 4 here, which is the default output off the head method. The 1st 5 rows. Now we can change this. I can say to command and save Run it again. There we go to force to I could say 10. And there we go. The 1st 10 rows. So very handy quick method to check the data of your data friend. If that head shows us the top rows of our data frame what you think shows is the bottom row's. We're gonna get to that. Now, in just a moment on when I want to stay here is just in a comment is inspect first rose of data. There we go. Okay, So I just asked, What do you think shows the bottom end of the data file? Well, let's have a look. Print DF does tail again. Very English readable. Enter that. And as you can see here I get the last five rose off the data frame to 48 to 2 52 And again , I can change that so I can enter into what does that give me? It gives me the very last two rows up data frame again. And not a very quick and easy way to inspector data. Whatever. Getting cold. Heavy. Okay, so when I want to say here is inspect Last Rose update. Say that data. There we go escape. There's not putting that in on. I didn't put a comment in front of that. So let me do that now. Or a hash and funded out to come down to let me do it. Now, what else can we learn about a data frame with a few simple methods and commands. Well, we can look at the info on the info is a top level summary of our data frame. So let's have a look. Print DF dot info and go info. There we go. Let's run down and see what we got. OK, excellent. So we have a range index. So that's the index on the left hand side on. What's letting us know is the range is 253 entries. Zero. So, as I mentioned, indexes started. Zero going to 252 rows are range Index information is followed by the road types. So as you can see here we have the column. Names date open, high, low, close the nominal. Which means they're not empty of data. They have data in them, and then we have the types at the air. So floats. That's that in a comment here must be say, here we're gonna save you Summary data frame info. That's a nice one. Let's look at an individual column. So as I mentioned earlier around when we use the method DF dot columns, we gotta know put up the column names only, but were able to extract individual columns. Let's have a look. How we do that. Now we're gonna create new variable open equals D F square bracket open. And this here has to be exactly as it appears in your data frame. So as you can see capital here because in my data here it's a couple o open. Okay, next print open. And what do you think the output of this is gonna be that save it and run it and find out from 0 to 252 all the open information from our data frame starting at zero all the way down? OK, so let's give this a common so that it's nice and clean for when you pull down the data later on. So view open color on Let's calm and this Peace out Comment on comment There we go. Previously we saw the head and tail command on. We've just seen how the view and individual column we can change those demented together so we can say print open dot head. Let's see what that looks like. Oh, an hour. OK, let's find out why and obviously the errors, because I've bean so keen to keep my cold clean that I come out are open variable, so there is no reference point here for this open variable that save it on. Run The 1st 5 piece of information from the column open now toward pointing out here that when you extract it. Single column, it creates what's called a panda serious. So this column here that we've returned on the previous column appear is what's called a serious. In this example, I created a variable called Open to print the open column of our data. But let's no comment this out, hoping for the right reasons this time that's coming out our print option. Let's move down here and let's look at how we can print out one or more columns. So let's say print DF square brackets, so that brings us into our data frame DF now, but we want to go deeper into the data frame. We want to extract column, so we need to create and notice where bracket open Comet close. So we want to pull out the open and close information from our data frame, but again, we don't want all 253 rows. We just want the first few so say Head, There we go. Let's run this and see what we get. Accident. That's exactly what we wanted. We have the open column. We have to cut clothes. Come. We have the 1st 5 rows of data and we haven't indexed again. And not a very quick, efficient, smart way. The viewer data file That's just out in a comment here. So view one or more columns side by side. I think that's descriptive enough that we coming this out on. Finally, we're going to look at the describe method. On described matter is an output of statistical information under data frame. So how do you think we would do that? Print DF duff Describe. So as you can see here, no great leaps of code or anything like that very English readable. Dropped this lecture. Let me run this piece of code now and here we get I describe method where the output over describe method. So the count the mean standard deviation the minimum 25% quartile 50% 75% on max. Now, you may not be familiar with what all of these mean right now, but that's OK. We're going to get very involved with them in later lectures. But what's important right now is that it works. So demean is your average. So the average opening of all our data rose 250. Tree was 48.42 The minimum opening was turning 9.34 and so on, and the maximum open was 57. I didn't comment here, So using they describe method perfect. So we have that comes it out, and that's just coming here. So, as you can see on the right hand side in my onstage is pain. I have my file name inspecting data that p Y ready to be pushed up to get hope. So what I'm going to do now is I'm gonna stage this file. It's moved down into my stage and pain, so it's ready to be committed. I'm going to give it a message. So what is this? Well, this is our lecture on inspecting data using pandas data frames. There we go. I want to commit this to master. Now, when you press commit the master, it's committed to gifts which is on your local machine. But you want to push it up to get hub on that. You can see down here into right on the right. I have pushed one, which is one file. So let's pushed up now. There you go. It's pushed up to get hope. Now I will be walking you through this process in every lecture. But it's good at the beginning just to show you my processes and how I do things and just a close off. What I'll do is I'm jumping to get hope. Now I make sure that what I've pushed up has gone in. So here we are. And get hope. I'm just gonna give my page Refresh on the first folder here. Introduction to Pandas. Lecture on inspecting data using pandas Data frames updated less than a minute ago. Let's have a look here. Data frames. Okay, so I'm looking for us to underscore to There we go less than a minute ago. Let's click into this and see what we get. There we go on. Actually, that is a thing of beauty we've learned have the import a data file and then below it nicely summarized in comments all our commands and method that we've used to wrap this class. OK, that's it for this lecture. Thanks for listening. And I'll see you in the next 10. Conditional Filtering: Hi, folks. And welcome back. Okay, this lecture, we're gonna be looking at conditional filtering and conditional filtering allows us to filter are rows and columns in our data frame using certain conditions such as greater than last stand or greater than equal to. So let's jump right in and take a look here. An atom I've entered in some sample cold already. Import ponders on. I've read in our data frame. So DF equals pedido upgrade underscore CSP. And we're still working in the same data file, which is intel Got CSB. It's what? The left hand side conditioner filtering file on. On the right hand side, I have my file ready to be uploaded to get up. As I mentioned in the last lecture, when I read in the data file, I like to print it out to my screen just so that I know that the reading went correct on Everything is okay, so let's do that again now. So fightin as to underscore tree. And as I mentioned, a few top. Now we get the fire name fell in first print, and there we go, our data frame. So I'm not gonna flick any more. True than that. So let's take our data frame number particularly interested in the open column in this lecture on values within that column above 46.0. Do we need to check each individual row or is there a better way? And if so, what is it? Well, that way is condition of filtering. So let's take a look. Let's have a look at my open price equals DF sort data frame open. And, as you can see, something I didn't say in the last lecture within the data frame are open. Column is surrounded by quotes greater than 100. Okay, print my open. There we go. That's run this and see what it looks like. There we are. So it in our data frame we have it looks like everything is false, which isn't much use to us. But what this is told us that where any price is greater than 100 print True, Let's just edit it slightly and say over 40 save. So now let's change the truth. But in the beginning of her data files, you can see here the 1st 9 the 1st 8 values were false, which means that they were not about 40 but having our output, any Boolean manner, true faults is not very useful to us. Let's look now at how we could make this a little bit more readable. What we could now do is pass are very about my underscore. Open back into our data frame like this. But before we do so let's just come in of the print. There we go on, let's say print data frame square brackets. Sorry, my underscore. Open. Now let's run this and see what we get. Much, much better. So as you can see, we've returned rows and columns. But what we've returned here now is the relevant information, including date open, high, low close columns. And as we saw in the previous one, our data above 40 starts at Index Row eight weaken four. To clean up our print line here, just common this out again and put it into one step. So let me coming up my open here. That's not comment. That's a comment we can save. Print DF data frame DF open within quotes. Greater Dan On this time, let's say greater than 55.0 closing square bracket and let's run this and we get an error. And as I said in the last lecture, your open house be the exact same as it appears in your data frame. So I didn't capitalize. These it the old. So let's run it again. And there we go to know we have a much smaller return data set beginning at 1 37 on these at the Rose and the columns, where the values are greater than 55. When we use this print statement here to return the data, we are removing the need to create a variable as we did in the first line. Okay, this nice short electric. That's condition of filtering. I encourage you to play around with it, change the values and see what we have here. There we go. Much bigger. Data set returned when values were below 55.2. I encourage you to play around a data set and figure out how you can use conditioner filtering. Thanks for listening. And I see you in the next lecture 11. Using NumPy and Pandas Together: Hi, folks. Welcome back now. You may not be aware, but pandas is built on top of a piping library. Numb pie on dump. I exists to allow us to perform scientific computation. In a previous lecture, we looked at how to create a Siri's from a data frame color. Let's reexamine that again. Now here we are in a new pipe file. And as you can see, I've already entered into lines Import Pandit's PD and I've read in a data set again, it's our intel that CSB file. What I want to do is quite new. Variable open equals DF open, making sure they're enclosed in quotes. Cup capitalize er oh, print open. Let's see what we get. So I'm gonna bring on our new fine now, So that's underscore Four top. Sorry, I should have a pipe confronted up, Making a mistake myself. There s underscore to as to underscore four time. So there's are open column extracted from our intel Nazi SV Data set. How do we know what type of data we're dealing with? Well, as we saw in a previous natural, let's say type open and what do we get that save it and run it. I've left out a bracket. Save Ron. There we go as hard. It's a serious. So when you extracted column from a data frame, it becomes the data type Siri's. But another thing that the output is is it's actually a numb pyre. A on Let's confirm this now. And to do this we want to extract the values from our open variable. When we do it up by saying our new open, he goes open that values print type new, open, save run. Okay, now our data is a numb pyre, a on in later lectures. This knowledge is very important. The values attributes that we just used is used to extract numerical data from the Siri's. We can see what this looks like. My typing print. New open, some new open Close the bracket Runner file. And there are you numerical values of our new open variable. And as we've seen a few moments ago, this output is a number higher. A. So in actual fact, a panda. Siri's is a one dimensional label. No empire a on as I just mentioned, this is very important for later lectures. Thanks for listening. And I see you in the next lecture 12. Creating DataFrames with NumPy: Hi, everybody. And welcome back in the last couple of lectures we've been looking at helped create data frames from existing data, like are in town dot CSB file Over the next few lectures, we're gonna look at how to create data frames without any data. And now that we know what a pound is, a known pichon work together that's investigate this a bit more by creating a pint of data frame from num pine numbers. Here we have a new piping file. So I name this file data frames from don't play and that you can see at the top here as well as our usual import planets. PD I've also and a new line import numb pie as NP. So obviously we're going to be importing the new pie library into this fight. Now, the first thing that we want to do is create a new numb pie array, and we do that with new array equals and p dot arrange is in a range. We're gonna go 0 to 10 that reshape, and this means a shape that the array will take. What shape do we want it to be? We wanted to be to buy five. Okay, let's have a look and see what that looks like. Printed out down here. That's safe. Heighten asked to dash five. And there we go. An umpire A of two rows on five columns, if you're already familiar. Replied this is pretty simple stuff to you, And you know that our output is a number pipe matrix. We can convert this to a data framed by entering in this line. That's a d f equals P D dot data frame toppings bit off their date up equals. So what have you got? We have our new rate print D f Let's take a look and see what that looks like. Okay, so we still have a major. Except here. Let me just hide that for the moment. Make our output Livermore readable. Okay, run it again. And there we go. So this time we have our output. But as you can see, it has taken the format on the data frame. So on the left hand side, here we have our index, which goes 0 to 1, which means we have to rose on along the top. We have our columns, and they've been automatically numbered by pandas. 01234 to each column now has a label on index, and, as you can see it, go from 0 to 9, which is what? Which is the range that we specified up here, 0 to 10 remembering that we're starting at zero. But we can take this a step forward by adding in the columns argument, which were which will replace R 01234 with anything that we tell it. So let's have a look at how we do that. Now, in our data frame argument, se columns equals square brackets on. We wrapped each label in a quote. I'm going to say a be C D he okay again? Pretty simple stuff. Let's say if that run it again. Excellent. Now we have our output with our column names a BCG great. The next After we're still gonna love your credit data frames. Whenever any data on we're gonna look, I have to do it from a dictionary. Thanks for listening. And I'll see you in the next lecture. 13. Creating DataFrames from Python Dictionaries: hi, folks. And welcome back. Now, After the last lecture, I'm sure there's only one question on your mind. And that is how do I create a data frame from a dictionary? Well, let's look at that right now. As you can see here I have my piping flying open data frames from dictionary dot p y. I have important kindnesses. PD and I have created a dictionary called core sales. Feel free to policy video here on entering the dictionary into your own code. Ever. So in the dictionary, you can see here the key on the value on the values are lists. So course has a list, of course, names fightin Ruby, Excel, c++. So if you haven't guessed, this is a dictionary. Of course, sales on the price is over a one week period. So we have to course along with the names we have days along with obviously the corresponding days of the week. We have the price on. We have the number of sales and down here, just make sure everything works. I have print core sales, So let's run that now. Okay? Our output is there as your key and our value output. Let's convert a sales data into a data frame. When we do that with the following line, let me hash this out for a moment on Let's say data frame sales equals P D duck data frame . So again, like the last lecture, we're converting to a data frame I'm creating and we're converting our course sales. There we go Print D f underscore sales and let's now see what that looks like. Excellent. That is a thing of beauty. So in the left hand side, as we've seen previously, we have our index of lining the number of rows in the center or the body over data frame. We have our course, our day, our price, our sale. And then we have the course names we have today. They were sold, We have the price they were sold for. And we have the number of sales. What can we learn from converting dictionary into a data friend? Well, first off, the keys of a dictionary have become the column. Labels on an index is generated by default, let's know, create the same data frame by breaking out core sales data up into individual lists. What I'm gonna do is I'm going to hide these lines of code. I'm going to start off with course equals Fightin wouldn't end, obviously, Ruby Excel on C plus Plus. Now I'm going to jump ahead of time on and in the rest of the lists on Africa. Deborah, there we go. As you work more more with Titan on data and see us three fires and all sorts of files, you will begin to receive list of all kinds of data. So that's now create a list of column labels, so labels equals So what were labeled from before that we had? Of course we had a day. We capitalized us on. Let me enclose it in quotes. Otherwise, we're just gonna get an error. The other day we had a price on we had a sale. What I'm gonna do is actually I'm gonna rename that because sale is not very descriptive. So what I'm going to say is number of sales. Here we go live in more descriptive, and it looks better to now What we want to do is add Eliza column entries for each column we just created. So that's a calls equals because their lists, which we have established up here course day price sale. So we take each of these lists and we Adam to a new list called Coats. So we have course day price and sales. This is obviously a list of lists. Now let's combine these lists together using piping list and zip functions to construct a new list called Master List. The output of this master list will be a list of Topples column named and columns, which we can pass to the dictionary command. So let's say master List equals list. Zip labels and columns print master list. So before we do that, let's just quickly recap what we've done. I need to take this out for a moment. Let's cut that out. Let's jump down here. So we created several lists. 123456 So we have six lists. Let's just deal with the forest, for we have course when we have our courses name. We have day and we have the days of the week on which courses were sold. We have the price of each course on. We have the sale amount or the units sold. What we've done done is combined a column names into a list called labels. We've been taking our lists and combined them into a list called columns. We then created very well called Master List on. We're packing everything into this. That's no print out a master list and see what we get. And obviously we get sales not to find, because here it's sale and here it sailed. So let's just correct us and let run again exactly what we expected our lists within lists . What we want to do next is creating new very both data equals. We want to put her list into a dictionary master list. Now new sales equals, so we're taking our dictionary. Data are converting is into a data frame pd dot data frame data print new sales printed in what word? That is print new sales. So one more time we print that a master list, which is down here, which was a list within lists. We don't created new variable data on India's variable. We're creating our master list into a dictionary using addict method. Now we're defining a new variable. New sales on what we're doing is we're converting our data dictionary into a data frame on assigning that day, the frame to the vary about new sales. So when we print out new sales, we shouldn't have a nice, clean data frame that's redundant now and keep our fingers crossed. Excellent. On the left are index on the top course day price number sales on inside. We have our sales information. You know that this is a day to frame as well, because here between our index on their labels, there's a gap. So our index starts at 1/4 row of data, and that's a sign of a data frame. Okay, so plenty during that lecture. So I encourage you now to go away on experiment with your own lists. Dictionaries on data frames. Thanks for listening. And I see you in the next lecture. 14. Using Broadcasting in DataFrames: Hi, everybody. And welcome back in this electorate, we're gonna talk about broadcasting in pipe. When you have two arrays of different sizes, they cannot be added together. Subtracted are generally used in the rithmetic. Broadcasting make this possible by making the smaller array the same size as a large array . So we're gonna look now at how we can use broadcasting in our data frames. If you want to learn more about broadcasting, I've included a link to its documentation within a useful links section of to Read Me file . Let's jump now into our code, ever So as you can see here, the code already imported It's the exact same from the last lecture, and we're just gonna add in a cup of lines to the end of our file. But first, let's just run a code and make sure it still works. As you can see, we have the open from the last lecture we have. Our label names are index on a data inside. Let's imagine for a moment that we're having a 24 hour sale and we're going to reduce all our core. All, of course, is to the price of $2. We could go back and up later lists. Or we could use broadcasting, which allows us to easily on a new color on Let's do that now. So we want to create new column D F underscore sales. So data frame sales new price we want to do is we want to give this column value the value of two. Let's print this out and see what we get. Amazing. So here we are. We have our original existing data, but we also now have a new column new price with a new value of $2 or to Europe or whatever country you're in, added to each column. And that's how we can easily update our data framed without going back on everything. Lists, dictionaries or data files. Thanks for listening. I'll see you in the next lecture. 15. Labelling Columns in DataFrams: hi, folks and welcome back now, keeping with our example, of course, sales from the previous lectures. Let's say we want to update born or even all over column labels. We could just ended the color names at the start of our code, but where is the pipe in front of death? Instead, we can create a new list of column labels on Let's Jump Into a Cold Under Now and see how we can do that. Here we are. And as you can see, it's similar code from our last two lectures. But what we want to do is, as we go said, create a new list off column labels, and we do that with the following line column labels equals. So what we doing here? We're going to say course day so very similar to what we've hade, and we're going to add him. One exception here now in the moment price. Now, instead of sales, what we're going to say is 24 hour. This couple is that 24 hour sale price. Next, what we do is we assigned less of labels to the columns, attributes off DF columns, so that stood up now, so DF underscore sales. Doc Columns equals column labels which have just created finally reprint DF underscore sales. Save it on. Let's see what this looks like. Okay, so here we go. We have a previous data frame on. As you can see, the last column is sale with their sale price. But in her new data frame, we have renamed the column to 24 hour sale price. So we could do that. But 12 tree or all of columns in Anne Data friend. That's it. Thanks for listening and see you in the next lecture. 16. Creating DataFrames with Broadcasting: Now, in this lecture, we're gonna take another look at an example of how we can use broadcasting. So as Majin Broadcasting is a feature of numb pie, that's a look. Now, another example of how we can use it. So here we are, here, back in the code editor. And as you can see, I've already got some code added to our piping file. So I have, as usual, import Pandit's PD. But then being from our own, what I have done is I have created a list of the counties of our of which Director t two. So wherever location you are in the world, this could be cities, countries, whatever you want. It could be anything. I've just gone with something that I know now what we're gonna do is we're going to create a string with the value Ireland. So that's our country equals Ireland. There we go. Next. What we're gonna do is we're going to create a new dictionary. So we're going to call this dictionary Ireland, we're going to say equals. So what have you got? What we have country correct that their country. So that's the key, and I'm gonna give it the value country. If we just input it a moment ago next we're going to say County, which is our list of off. We're going to assign counties who are assigning the list of counties to to the key counties. There we go. Next, we want to create a data frame, so DF equals P D. Got data frame. That data framed is Ireland. There we go. Print DF. Okay, fingers crossed. And let's hope that this works. One file Orient, but we're in. That's underscore. Nine. Your fun names. I was gonna be different. That's run on. There we go. A beautiful list off country Ireland obviously county all the counties of Ireland and index on everything 0 to 31. But that's actually 32. There are 32 counties in Ireland. Okay, so it's a very simple, quick example of how he can use broadcasting to assign a list, toe a value and create a data frame. Thanks for listening. I'll see you in the next lecture. 17. Data Cleansing Techniques: Hi, everybody. Welcome back. This lecture is all about data cleansing, which is the act or the process of cleaning up for data so that we can use it for data manipulation and analysis. Obviously, over the course of your data science careers, you'll be receiving data files. Data sets from all sources soon will be clean. Some will be kind of clean. And some will be, I suppose, very dirty. In their sense, they might have headers missing that might have data missing that might have time stamps that are all wrong. So we're gonna look at a few ways that we could clean up data files so we can make it easier to use them in pipe Pandas, Let's jump over now into a code editor. Here we are now in a cold weather. I'm in a new file called Data Cleansing. And as you can see at the top, I've already entered the line import Pandit's PD. I've entered in the code which reads in my CSE file PD that read underscore CF CIA CSC. And if you look towards the end here, we're going to look at a new data set in this lecture on That's a data set that it downloaded from crackle dot com, and you can find the link in the re be file on get Hub or if you just download to fire directly on the data set them. Lord from Crackle is a data set from the Google Play Store of Apps, downloaded over seven period of time. So we're going to examine that data now. In just a few minutes now, you might have noticed that my flying names can be a bit long, and I do that for a very particular reason. And that is for you to students so that it's easier to see what what if I were working in on a description of the fire, There's no point in me calling to fire. Just lecture Warren lecture to I like to have very descriptive names on the files, but I can get a bit long and sometimes could be prone to error if the father name is long. So how can we make that a little easy on ourselves? Well, let's take a look. So we have a file there. Let's just make sure it's working. Prince DF save that led to run it OK today we go are filed. Runs perfectly fine. Data imports into pandas. We'll get into examining the data in just a few minutes. So to help make our find names of Livermore imagine what we can do is we can declare a file path variable so we can say file Pat. This could be any name you want Fire Pat equals on Let's take our final part from appear so everything inside the brackets, including the brackets pay Start in now let's have a look. Data frame equals P d Don't read underscore CSP file pop completed and kicking in there. But here we go file Pat on. Let me just comment out this line of code Perfect saver file. Run it again. So how come we didn't get anything returned because it happened? Use a print statement, so we'll have a look at that in a few minutes, but I didn't get a return error to another. My file part is working. So that's examined now data file that we're loading in the Google play store data and help me do it. Us what we can do that, but something that we use at the very beginning of this class which is print DF dot info. Save run. So as we can see here, we're dealing with a range index. We have 10,840 entries, which is the biggest so far. But what you also might notice is here instead of actual column names. Pandas has taken the first row of data to be the headers, and Pandas does this by default. If there are no column names directly declared. So obviously that's an issue that we need to go to correct. Pretty. So now we've had a look at our files info. Let's have a look at our files header information. So that's changed us up to Heather brackets. Save on an error because header, they should know better There. It's not had it. There we go. It's head on. Let's have a look. Okay, indeed, our file is loading in, but as we just mentioned by default, Pandas uses the first row of data. If no header information is explicitly declared. So if you don't give in column names, what Pandas does is it takes the first row of information and uses that as the header. So that's what we're gonna need to correct But what are the actual column headings? Let's take a look at the CFC file that we're working with. Okay, so here it is. So you can find the original See, SV in the resource is folder off this code file and I've entered this file from the original just to make it a little bit smaller and little bit cleaner to work with. But as you can see, we have no headings here. But I know from the original file that are heading there up Rating reviews, size, number of installs, type where it's free or paid. The price on data was last updated. We can also see in our data here that there are some minus one value. So we're gonna take a look at them and we're gonna look at the date format as well and see how we can work with that. So let's get started cleaning our file and we start with the header. So back in her file, what we want to do is we want to remove the default setting off pandas, which takes in the fourth row as the header information and say header equals none saves us . Run it again. No, No. One, and that's because should be a couple of end. There we go, a little bit cleaner now. Already, we can see our column header. Names are 01 to treat 4567 all the way up, and we have our index here. First row of information out index zero is an up name. So adding in header equals none of stuffed pandas from using the fourth row of data by default. Now let's give her CSP the column names as we just discussed when looking at the CSC file, and we do that with the following line of cold. We move up a couple of lines here on we say column names equals. I'm just gonna be a list, so you want us out. First up, everything is enclosed with quotes, so we have up ratings, comma quotes, reviews, sighs, number of insults type. So remember that was paid or free price on last update. So now what? We have a list of column names. What can we do it well here, where we have heather equals known, we can explicitly declare or tell pandas to use the column names that we just defined in the header, so names equals color names. There we go. Let's save that and run it again and see what happens. Hey, okay, As you can see, here are 12 tree from the previous from the previous output has been replaced with the names. We just input it. Okay, that's accent, and it's very good progress. Now let's examine a data a little bit more. That's print of the head again. Oh, no, it's print out the tail. Actually, let's have a look here because I think there's more erroneous data at the end of a file misspelling. Okay, there we go. So there's an example of a minus one. Obviously, a rating can't have a minus one. It should be 0 to 5. So how can we correct that? What we need to do is replace the minus ones with N A N, which stands for nothing number. So let's add the n a underscore values keyword toe our line of code, and we do that right here. So in a underscore, values equals And what is it? It's minus one. So we're replacing minus one with n a n. That's run it again and there we go. Weaken CR minus one has been replaced by an n a m. On that would be true to fire. We're just looking at a very small subset of our 10,000 rows of data. So remind is one is gone. It's being replaced with N a n. So that's excellent progress. So now we have our headers and we have our n a n values all inserted. What else can we do? Well, let's take a look at the index. So, as we've already discussed, the index is defined by default by pandas. So it started zero. It goes all the way up to the last row of data. So in this instance, 10,840 is the last row. So let's have a look on a full file Here, let me just clean this up, save it and run it. Let me just expand this out a little bit so you can see here a few more Any ends of all being inserted, which is excellent. So let's have a look. Here's our index starting at zero, going all the way down to 10,840. But what if we needed to change this? What we needed to define our own index, such as the date or the price or the APP name. Let's have a look and see how we can do it out with the date. So first we define our index. I'm going to find our index like this DF dust index. So our data frame index on what we want that index to be we wanted to be from the data frame last. This has to be surrounded unquote updated steak. The protect The column last updated on inserted into her index. Let's have a look with that. Looks like our index numerically 0 to 10,840 is gone, being replaced by the date exit Frago progress. But, as you can see here are last updated in our index and still showing here at the end of our file. What we can do is we can create a list of what we consider to be on the relevant codes. So let's say we just wanted our index, which contains no, contains the date, the up name on the rating, So how can we do that? But we can define a new columns variable. So that's how they look new columns equals on. We could give it the color names we want. So let's see up on reviews. Now let's assigned newcomers to our data frame so data frame equals DF new cones. Save that and printed And let's see what we get Accident and even cleaner looking output. So we have our last update as her index. We have a up name and we have the number of reviews. So obviously you can use this process to produce data frames of Onley certain data. OK, so we have our shiny new data set here, which is DF has our new columns. What happens when you want to share this information? So just as we use, read, underscore CSP, we can use the to underscore CSP, which exactly as it sounds output to CSE. And how can we do this? Let's have a look. Let's go down a couple of lines here. Let's remove our print statement. Let's say out underscore CSE equals give it a name. So Google play data now our data frame dot to underscore CSC, he goes, What's our data? CS out CSC. Okay, if I save this and run it, we're gonna hoping is it see SV file is generated called Google Play data on this generate to see SV. But as we can see her here on the left hand side No see, SV fire has been generated. And that's because, actually, I've been a very silly Billy on. I have made an error here by putting an equal sign. Let me just run this again on There we go. Google play data. See SV fun. So let's open this up. And as you can see here we have our column Headers Last updated APP on reviews January 7 Photo Editor on the number of reviews on Lee The Irrelevant Column Headers that we'd specified previously get exported to a CSP. So that is excellent. Excellent progress. Very helpful. We can also export to an excel file. We can do that very similarly without underscore X now s X equals again in quotes. Google play data this time This is let's just say excel so we can distinguish it. And then DF does to underscore excel out underscore X l s X now one small difference Here is that Internet file name here. I haven't had out of the just yet, so I left it the last just so I could make a point of it. You need to add in the Excel file extension Excel s X. You don't need to do it with CS V, but you think you do need to do it if you're exporting to excel. If you don't have the file extension to today, you will get a value error. So that saver file here and run it again. And there we go. Google play data on Excel file data on honesty That's gibberish because it takes ever doesn't translate Excel. But if you go to your root folder, you will find two new files there of your exported information. Okay, so that's it for data cleansing. Thanks for listening. And I'll see you in the next lecture. 18. Creating our first Plots: hi, folks. And welcome back. Now in this lecture, we're going to look at how to use pandas for plotting. So how can we get data sets onto graph format? So I shouldn't say this, but this is probably one of my favorite lectures because what we're about output is very tangible. It's not just exporting a data file or cleaning or cleansing data. Manipulating data we're going to do is start creating graphs with data, and that's very real Ting. It's something that we can save. It's something that we can email. It's very, very easy to see a very, very easy to grasp. So without further ado, let's jump right in. Okay, so here we are, back in atom. And as you can see, I've already ended. Some coats are important pandas, and I've also imported Matt plot Lebda p Y plot as plt. And that's what we're going to use the plot Our information later on in this lecture. As you can see, I've already imported my file. I'm going back to our intel dot C S V. So as mentioned previously in the sample data that I have provided, few are get hope, sample data introduction to ponders all the information old data sets up with views so far are store it there so you can browse true, have a look or use your own data sets. And at the end of this line of code, I've said our index to the date column. I've also said Para States equals true and that's just let's piping pandas know that we're gonna be working with dates and actually for him up to date information as dates. So let's jump in as usual. That's first check of file. So print DF the info saved up. Let's run it. So what finally did and went ridin with s 2 11 soap? Iten asked to on the score 11 top. There we go. So what's different in this output that is previously being well, because we said para states equal True, instead of having a range index output from our information, we have date time index. So pandas knows that we're dealing with a date time index. So on the left hand side where data? Where we saw a 0123 We're gonna have gonna have the date. I'm pandas knows that that that is date time index again, we can see the column names open, high, low, close, etcetera and things like that on their floats. Let's have another examination over information on Let this on. That's this time print of the head save. Run it again. On there we go. Nice and clean. Open, high, low, Close. Over here on the left are index. Is the date pretty nice? Pretty nice and clean? I have to say so. A date has now been set as the index. Now let's create a Siri's and get the values from our data file. How do we do that? That may just come out of the print here. On what piece of information do we want? Extract. Let's go with the clothes because the closing price is always something that you cannot uses a marker so close value equals D f close. Don't forget the surrounded in quotes that values. So we use this previously and I told you be coming back up. So here it is now. So if you check the type of the close, find you very book, we can see that it will be an umpire. A. And that's just double check that now print type closed on you. It is there that saved this. Let's run it. Hopefully it's an umpire, right? No, umpire A So some information on reviews and previous lectures coming back. The hunters now coming back to be off Good use. Let's create a first plot using the PLT command. So how do we do that? We say, Let me just hide our print statement here when I say hi. Of course, being coming. Plt the plot. Close value. There we go. AutoCorrect kicking in. There are auto fill or order complete. Hopefully when we run this, we're gonna get a plot Fingers crossed? No, on. Why not? Because I made a schoolboy error and beginner and beginners error. Plus that show I was completely. And climatic plot that show. I forgot to show my plot. Hey, presto. The closing prices of her intel data set. Very nice. We're gonna just is up a little bit in a few minutes, but this is a great start. The horizontal axis represents the date, so you can see it here. But we're gonna correct this. We're going to clean this up now. A few minutes on the vertical access is the price. So that's going to clean this up a little bit. So we just used the num pie plot. But Pandas data framed also have a plot method, and we could implement like this. So that's close. This. Let's get rid of these two lines here. Let's say data frame. So our data framed up where it used to dealing with plus not too different from what we just deleted. Plt dot show What you think's gonna happen here because what we're dealing with is the entire data frame. So let's have a look. Okay, very, very messy looking. Let's have a look here. Let's just explain this for a moment. So we said de estar plot on a data frame DF, DF appear is the entire data frame file. So the entire data set, we haven't felt it out In the last lecture. When we looked at cleansing data, we learned how to extract only relevant columns from our data frames. So if we were interested in just the clothes at a higher low, we could refer to previous lecture export that column or extractor column into a new data frame and plotted. But here we just planted everything. We've just brain them onto her plot here. And as you can see, our pandas plot met, it automatically gives us a legend, but it's very hard to actually tell the difference lines apart here because they're so close together. They're all on top of each other. But it did insert the date for us in monthly increments. Let's see what we can do here by adding some of her own customization early around here, we're interested in the close value, so let's stick with that. That's coming this out on, Let's say, D f close that plot. Color pickles, green style equals dash. So this is line. So when I say Dash, here is the line plot legend, so do we want alleged? Yes, we do equals truth. Let's say this. So in this line of code, we selected the clothes Come. We've given it a color line style and chosen to show alleged we confer to tiny dis open to following plt. So plot access from what we want here, well, we know that we've got 2017 date information on. We know that we've got 2018 date information. Let's see what this will do when we go into a price or why access are vertical access to go from zero to 60 because that's the price range that we have in our data set. So let's save this and run it and see what we get. But before that, just a quick recap on what we've done. We focused the Axis under years 2017 and 2018. Here we go on. We set the vertical axis. Hasn't mentioned from 0 to 60 on all of this. Helps make a plot 11 clear. That's run it and keep our fingers crossed. There we go a little bit better, so we're only dealing with the close close out of the one. We're only dealing with the closing price, which seems to move between Middle 45 or so up to 50 and we're dealing with date range of 2017 2018. Now you can see we have a real legend date on a price here so you can let your mind wander now with this. Plus, let's say we had our Intel stock price. We had a ND we had in video and video order related stocks and a quickly grass of power off the data frame plotting method, we could start to compare plots compare pricing, data, closing volume, open volume, things like that on. We're gonna have a little look at something like that in just a few minutes. Okay? So let's close off this plot and why do we have to plots here? But I forgot to remove a line of code, so DF dot plot. I don't need that because let's take a quick look here. This is my DF plot. So just to clear that up, So I saved it again and run it and I want to get one plot. There we go. That's close that now let's just coming out this line of code, Let's map move back down and uncover are uncommon. Rdf dot plot. Let's just talk about some other customization. So this plot is going to show us all the data over data frame, and that's okay because we've just discussed how to pull out specific columns like the clothes, the open toe high, whatever it might be. But let's just look at some customization that we can do here on the fly so you can say color equals blue. Let's have a look, save and run it uh, I know why. Because, uh, because I'm just slow, that's why. So let me just hide this line of code. Save up. There we go, Blue Everything is blue. I think that was a song once. Actually, What else can we do? We can give her plot a title. Plt that title on What would the title be? Could be stock price Save it, run it. And there we go A nice clean title at the top of our graph close that what else come to give it. But we could explicitly Neymar x label. And we could do the same, but of white label. We can do that for it. Ex plot. That X label equals dates. It's a date range on the y label plot dot Why label on What's that gonna be for first start ? It shouldn't have an equal to in it. So let's say price and that's a A price is in dollars. There we go. Okay. Let me just move up here on remove are equal to say if that run it again. OK? Excellent title in X axis titling Why axis tightly, So much cleaner plot And you can see how we can easy apply what we've learned in the last few minutes and in the last lecture to make cleaner, more relevant graphs. Okay, so we know how the plot or data quickly and simply and we know how to show that plot. But what if we want to save our plot? That's pretty simple. We just removed a plot that show here. Let's say we want to go plot plt dot save s a v e f i g. So say fig method. So pandas say fig method I'm just gonna call it simply data frame TF, PNG Let me save that and run it. And hopefully we don't get an error. No, no error. And as you can see over here dfd PNG has been created in my root folder. So let's click Are not on. There we go. A PNG image of her plot. Excellent. But other ways could be safe While we can say J J Peg, save that. Run it excellent. On the left hand side, dfj peck has been created. Check that on a J peg over plot and finally is your on your weight. Whether is we can change this to pdf save that run it. There we go. Pdf has been created of our grass. That and they're Adam tax ever. It does not translate Pdf's, but let's see if we jump into our for all that. Let's have a look. There's our pdf, my adobes opening up, albeit very, very slowly on. There we go. A pdf graph off all of our information. So you can really let your mind run wild here at start graphing, start extracting relevant columns. Start cleaning data, plotting them, saving them. Exporter. I leave the rest up to you. I hope you enjoyed the lecture. I'll see you in the next. 19. Creating Line Plots: Hi, folks. And welcome back this lecture, We're gonna look at Pandas line plot. So we already had a very brief look in the last section at planting using pandas, and that was kind of quick and dirty intro into line plots with pandas born in this section . We're going to look at some other plots such a scatter on graph on. But in this lecture in particular, we're gonna look at line plots. So let's expand on concept mentioned in the last section which was comparing company stuff prices in a plot on we can do it is very easily with just a few lines of code says jump into a code under now. Okay, so here we are on, I've left in my import. Pandit's PD on my import Matt plot lived up ey as plot so you can see them there. Now we're using a new data file and that is Intel. Am the stock prices. So again, in our sample data folder were in a new section on this section is entitled data analysis. So in a sample data folder, we have sample data. We have to fold a data analysis, and then we have to file. Intel am the stock prices and there it is for us to see so a very simple stock price file month Intel price for that month AM the closing price for that month. So these air closing prices says jump in. First thing we need to do is printed data frame to make sure that it's OK, you'll notice here and now instead of saying DF, I've said stock prices, so I want to do is print stock prices. There we go. Save. Let's run this file. Okay, there's a quick examination of our file. We have our index, which we don't really need, but we have a month. We have the name of our stock and the closing price of that month. We have a M D. And it's closing price for two months. Okay, so we can comment this line out. So the first thing we want to do is we want to create our white columns so we can have both stock prices on the same graph on we do that. But the following line Why underscore columns equals open brackets. Open square brackets in health so exactly as it appears on our data frame on a m d. There we go. Pretty simple. Next, we want to assign a name. Our axes Based on our information. So we're gonna go month X axis. I'm going to say intel in a m de on a Y axis on. We've already assigned to stock prices to the name wine columns. So we say stock prices there it is there for us dot plot. So we're taking our data frame. But we're saying plus our data frame on the X axis equals month, which is in a data file. We can see it down below month on our Why column is going to be. Why equals r? Why underscore columns? Okay, simple enough. We want to give a plot of title plt Dark titled. So we look at this in the previous section and what we're gonna call this? Let's call its monthly stock prices. No process. I think that would pronounce prices. There we go. Next. Let's create a title for a Why access on how we do that? What we say Plot that. Why label? There's our order complete kicking in. Let's say prices and again my fingers get in the way. Prices in U. S. dollars. So dollars U. S. And finally we shore. Plus forgot this. I think in the last lecture about plot dot show, shot show. There we go. Nice and simple. So many lines of code we got here one. Well, these two appearances. 2345678 So, on only eight lines of code. What can we do? Let's run this fire. Hopefully we get a line plot on on that line plot. We should have intel and am D plotted against each other. Well, hey. Okay, let's have a look. So we haven't legend intel in blue. So here's our intel stock price on we have a M, d and orange. So, as you can see over the months, am the is climbing or his intel is falling, but obviously still nowhere near worth as much as Intel. So are months actually haven't shown up, but we do notice is month citing a 01 234 So we look at how we're expanding this out in a little while. On the left hand side, we have our prices in U. S. Dollars and along the top, we have a title monthly stock prices. Okay, That's it for this lecture and coming lectures. We'll look at how we contain this up on how we can use photographs to represent information . Thanks for listening. And I'll see you in the next lecture. 20. Creating Scatter Plots: Hi, everybody marker back Now this lecture won't look at panda scatter plot. So what? Some very little cold changes we can create a panda scatter plot and from the data. In this example, I've taken life expectancy data from crackle dot com on evident to contain only data for ardent. Let's take a look now at a code of So when the resource is section over fire, let's get hope you'll find all of the original data files. So there's original life expectancy data, and that's it just mentioned our data file they were going to use. This lecture is just for Ireland. You can editor to contain America, the United Kingdom, wherever you might be. So as you can see, I've left in my to import lines I've read in my data file, So I've called it Life equals p d dot reid underscore CSP Irish life expectancy, not CSC. That's just printed out to make sure things working, save and run. And there we go year. So from 2014 up to 2000 or, I should say, from 2000 open 2014. So for 14 years, the most recent data that was on crackle dot com, and we have our life expectancy. Okay, so what do you want to do that's coming this out? Well, obviously wanted create. It's kind of like I just said, So that's common this out on let's say, life dot plots or plotting our data frame. So life dot plot and I'm gonna induce you introduce you now to a new organ which is called kind equals on here. You can import the argument for the type of graft you want to create, So you want to create a scattered on. We want our ex access to be the year so X equals year. What? We wonder why access to be We want that to be life expectancy. So why life expectancy? So there we go Life expectancy Next. That's out of title on what we want a plot type the let's say plot, doc. T i t l e. We're gonna give it Irish life expectancy. We want to add in the X axis label. So up here the only thing we're doing is telling fightin is telling pandas what information ? The plot. But this is not the label information here. We're giving it to label information. So we want to hear plucked up X label? What we're gonna give it. We're gonna give it a judge. Why label exact same thing? Plot that Why labour on a white label is life expectancy. Find like that. Sure. Pluck, pluck that. Show what you're very familiar with by now that saved that. Let's plot it. No, never. Because I did not Includes my life expectancy. My white label title with quotes that saved up on an even bigger error. Okay, let's have a look for a moment. So what's his Arab? Okay, Kierra life expectancy. So I made a simple era that I warned you again about time and time again. Your values here when you're planting after match. You Dave frame. Exactly. So I said life expectancy. But when I was creating the Island Day life expectancy file, I was lazy. I don't want to be in expectancy, so I just said Expect so that saved that. Let's run it again, Brackets at the end of show. Here we go. Okay. We got there in the end. Title. IRS life expectancy are y axis life expectancy, and it's taking the data directly from our data frame. So it started 76 was all the way up to 86 sacks of the year 2000 and goes only up to 2014 to put in our age. So scattered plots are great for comparing information so we could overlay the life expectancy from a different country from a different continent, whatever might be. And it's very easy then to see the different values. But here in this example, we can see the life expectancy is rising from 76 all the way through the years of in 2008 life expectancy was in around 80 years of age. But then, between 2008 and 2010 life expectancy shut up up to 86. But and bad news for the rest of us now going forward from 2010 onwards to 2014 life expectancy dropped on. We're in around now 82 82 years of age for life expectancy. Not great news. If you are only hitting your eighties now, are your life expectancy now because life expectancies be declining and around at the moment. But that's a really quick summaries that you can make from data using a scatter plot. Very, very easy to analyze data quickly and simply. Okay, that's it for this lecture. A very simple way. Have to create scattered plots with pandas. Thanks for listening. And I'll see you in the next lecture. 21. Creating Bar Plots: hi, folks, and welcome back. So we've seen lying plots and scatter plots on again. But another small co change. We can create a bar, a plot that's jump right into a code editor and get started. So here we're on. As always, I've left my to import lines at the top and stock price. I've imported my data file, which I've named which have named stock prices. So in town stock price. So, as always, let's print stock price saved out and run it and see what we get. There we go very simply month, January to October of this year on closing prices for intel. So what one do we want to create bar plots of how we do that? Well, we say so what are we planting? Were plotting our data frame, which is stock prices duck cluck. So under why access, we want to say why equals price. So along the y axis or the vertical access, we're gonna give it price on what kind of plot we want. This. As we said at the staff, we want this to be a bar. It's very simply buyer. We want a name or X axis. How we do that. Well, if we remember from the last lecture we say plus x label what we're gonna call it, We're gonna call it month. So along the bottom of a graph, we want month remove Alexis space of line there on finder. We want a plot that show not forgetting this time, brackets. Okay, let's run it front. See what we get. There we go. So are months. And don't forget that because data frames indexes at zero or starts at zero. We have here 00 all the way. Truth nine. That's October. Since the month on the X axis on the Y axis, we have a price like we specify it. So we can see here in the fifth month of the year, which would be June, actually, because we're a month behind because it is zero. That was our best price for intel. Now, what I'm gonna encourage you to do is go back from the previous lectures, take what you've learned and clean this up. So how can you make it all one color where we're seeing that we can go style on go color? We can specify that on the month we want to use the actual dates. January, February, March. So I'm gonna leave that to you for you to do as an exercise. Thanks for listening. I'll see you in the next lecture. 22. Statistical Exploratory Data Analysis Techniques: Hi, everybody. And welcome back and this left, you're gonna look a statistical exploratory data analysis, and that's quite a mouthful. Up until now, we've been plotting our information manipulator manipulating our information using piping pandas on. We've done being using pandas plucked method to plant our information graphically on. That's called visual exploratory data analysis. But now we're going to look at the disco exploratory data analysis. I don't want to say that too many times a sentence because I'm gonna get muddled up. So analyzing data but plotted graph is great. But what about the old school method of analyzing data with statistics? So, as mentioned in section two, we can use the duck described method from pittance. Hand us. Let's jump into a code letter now and take a look. Okay, so here we are in a cold weather. And as you can see, I have as I previously don't. I've given us a head start here or give myself a headset, I should say, by getting the file up and running. So in this lecture were using a new file and that's test lest up data over in our sample data folder here. If you go into data analysis. You'll find Tesler that CSB, which shows us the test of stock price. So there it is there. Date open, high, low, close. It just adjusted closed volume and then you have a date range here. So it's running. It's a year long at date range, so it's running from October 2017 to October 2018. OK, that's all we need to say about that. So what's the first thing we need to do? Well, as always, if you don't know what now I like to print out my data frame to make sure that the import has worked correctly. So print stuff, prices save that, run it okay, and there goes. So what do we know? Well, we know that we have column names date open, high, low, close, adjusted goes and volume. We have a range starting at zero. It's a day to frame because of this little space here on we've answered. Defined as a data frame. You don't just no doubt because we let space. We have 250 tree rose on seven columns, so we know where imports working on a file is nice and clean. So that's coming This out. So what did I say? Everyone I said to describe Method. So that's look of using that now. So print using the Dutch describe method and have you tough? Pretty simply, we say print stock prices duh. Describe save that, run it. And now we have a description. So we have to count. So we know that Indyk in the open column. There are 253 entries, and that's the same across all order columns in the mean is the average. So the open average price is 314. The the high average price is 320 and so on and so on. Standard deviation is 25. The minimum open amount is 252 and then we have our core tile. So 25% so in these findings, based on the fourth quarter of data to 50% data. So the second quarter, or the middle on 75%. So the last quarter of data and then we have maximum so 369. So the maximum opening price within that years date range is 369 for Tesler not to describe and it gives us a nice summary off what we're looking at. If we want a quick statistical summary without graphs, without any other information, that's what we can use. And it's worth noting that the values returned represent non no entries, and the values returned with only will only cover numerical data. So that's look at what we can do to analyze this information. Livermore. Well, we can print at the minimum value of Open, although we have it here that's printed out so print. And this is very similar to what we've done in the past. Print stock prices so called her data frame. Call a column from her data frame. So we're looking for the open to the minimum open value. Open that men to run this on. There we go. The minimum opening value for Tesla in the last year was 252. OK, we could do the exact same thing with maximum print stock prices off the open column that men I typed him men instead of Max Uni matters. Okay, there we go. Max 369. Okay, very good. Now let's print out the average. So what is the average opening price across a year? so print stock prices of the open column that mean There we go. Let's see what the averages treatment and four game playing for five. So some very easy ways to print out some valuable information about our data frame and you can see how this output becomes like. But when you're creating a program where you just click a button to give you the minimum, the maximum, the average, whatever it might be other data set. All of this is very useful information. So let's look at how we can. Plus, if you remember from previous lectures, we can say stock prices on what column do we want for? We're talking about the open one doesn't put that improperly open. Plus, we're just gonna quickly do this one, so I'm not going to give it any arguments like color or style around here like that. But you can feel free to do so. Plot that show. So, as you can see here here, we have our open values of test this stuff off test the stock price for the year time major we're talking about which is October 2017 to October 2018. So we have the starting price here. And as I said, I haven't given this graph any formatting. So the date is off, like we don't have our pricing label here, and we don't have our main craft label here, but you can feel free to do so. This is just for demonstration purposes. So we have our minimum value here we have our maximum value appear on. If we were to compare these values to the dot described method that we just use a few minutes ago, they would match up. But you can't really say that this is 362 point whatever it might be. And if this is true, 252 point whatever it might be. So there is an easier way to summarize this information graphically. And that's where the box plot. So what is a box plot? Let's have a look. So back in our kind here also back in our plot method here, we say, kind equals and you'll be familiar with this from previous lectures. Let's see what that does. What have we got here? We're rolling, deemed with the open price information on we know that that starts at 252 or whatever might be. Go back and check the duck. Describe information on we can see its maximum up here. Then we have the mean in the middle here with the Greenbrier on. We have it inner and outer ranges. So an excellent way to quickly summarize information, particularly stock price information or any other night in America kind of data box plots are beyond the scope of this introduction pandas class. But what I have donates included some excellent information on reference information into read me file. So check out the layer more. Thanks for listening. And I see you in the next lecture. 23. Filtering Data in DataFrames: hi, folks. And welcome back. Now, we've looked at many ways to organize their data on In this last lecture off this data analysis section of the class, we're gonna look at one more very powerful way that we have not yet covert. And that's called filtering or specifically conditional filtering. So condition and filtering allows us extract rose based on criteria so, for example, greater than less than equal to or whatever that might be. So let's consider our previous data, set off data from the Google Play store and jump right into a code. Here we are. And, as always, you should be well uses by now. I have already important pandas on imported math plot lip because you don't need my part. Lived for this one, but I left it in there anyway, So I've read in our data file, which is played. Data equals P. D. Got Reed on score CFE on. As you can see here, I'm referring to the previous folder or at a previous section introduction to pandas. Google play store that CIA three because that's where that file is safe. I don't want to duplicate such a big file. It will be too much memory. Now, as always, let's print out our data frame to make sure it's working. Print play data Save Ron So everyone of room we want to run Titan s Tree Underscore. Six top. There we go. Filtering on. We get our entire data frame. That's exactly what we want, so we don't need to go through coming down out for this example. Let's print out all rosewood reviews greater than or equal to five. How would we do that? It's very simple, actually, and you'll be very familiar with it from previous lectures. We say Print brackets. So what are we going into? We're going into our data frame specifically. What are we going into? What in your data frame? So we have to step one level down again. We want the ratings, so square brackets, quotes, ratings. This is the ratings color. You can check out that this is the right column by going directly into the original Google play Data or Google play store CSP greater than or equal to five. If I say it is now, what, you think we're going to guess? Just looking at a previous output, you can see 12 tree. So it's all numerically orders. That's giving us our entire day to send off 10,841 rose. So if I run out file now, what would we get? Kierra? Course there is. That's because it's not ratings its rating. Thank Google. This time we only have 275 rose only 275 apps. We didn't listen data. This is called a top, so they're only 275 laps with a review off. Five. And as you can see here, here's are raking column rating. 55555 19. So there's an anomaly, stairs and out. Liar. That's an error somewhere. So that's an excellent pick up. And if we were to graft this information, we would have seen a nice, healthy straight line of fives, and then we would have seen a big jump of 19. So it's something that somebody could look at and review and graph. That's obviously an error in the data. So that's how we use conditioner filtering the pullout information from from specific rulers Gonna come in this next. Let's look at creating a little bit more specific condition of filtering and created conditioner filter to find Arcade or the word arcade. Wouldn't these Youngers column on? How would we do that again? It's not that difficult. We create a new, very book, arcade data equals on what we gain with my body to where the data frame play data on. Once again, we use the same layering that we just did it up at the top, so play data. But we're looking into the genre column. So up here above, we're looking into the rating column, and this time we're looking into the genre column. I hope it's genre and not genres. I just double check that this time it genres on. What do you want to say? What we want to find Arcade. But in the genre, so a or a C 80 There we go. And for those of you familiar with conditional filtering, I'm basically piped in general, you know that this is means exactly so. The same as so in the genres. Search for Arcade That's printer Newark, a data print arcade data that saved this. Run it on its empty. That is unusual, and it's empty because our filtering is so good that because I've misspelled arcade. Nothing has been returned, Uh, K, totally showing my spelling errors there. That's what saved us and run it again. And there we go on. This time we get 220 Rosa touring. So within the genres column, there are 220 ups that air within the arcade category. That's how we can use conditions filtering toe. Help us create valuable subsets of our original data. For one more example, I'm gonna comment what we have here out. That's not a comment. That's a tree on. When I go to do is I'm going to use the same the same procedure. We just don't both for a test. Let's stop pricey. So they first need to do is change our file path up here. There we go. I also need to change our variable name because play date of stock prices doesn't make any sense. So I'll just say stock data save now. I have print stock prices, but you know what? I'm talking about data, but I just changed out to be clean and we say print, stop data open, don't it? Make sure we're working. We're working. There's the open color, commented out and Now we get to a condition of felt filtering. There's a print stock price and stop data open a square brackets, stock data. So we're going into our data frame. We're specifying a columns. Which column? Very interesting were interested in the open color. So let's say you are creating a stock pricing program of no large program on. You want a lyric or some sort of condition to let you know when to test the stop Price opens above Tree 15 we say, is the same as tree 15 Let's save that and run it, See what we get. There we go on Lee one entry. So in Row 20 On the four games in November 2017 test the stock price opened. A tree once lives very, very powerful stuff, so conditional filtering can help us create very honed in very targeted data sets from our original data files. That's it for this lecture under section. Thanks for listening. I'll see you in the next section 24. Introduction to Pandas Dates & Times: Hi, folks. And welcome back. Now we're moving into a new section on this section is called Timeshares and Pandas. So previously we've worked with data and string on number format and we've learned how the plus and graph in this section we're gonna be working with dates and times, So panda stores, days and times information in daytime objects on we can use the pandas read underscore CSC function to read strings into daytime objects on We freakley looked at this in previous lectures. Where we use the para states equals true argument to convert dates and times into the I s 08601 format from specified columns and date into daytime objects Never the I s 08601 Former presents Time data like this year, Month, day, hour, minute second. So now let's jump right in and take a look at some data samples 25. Indexing Dates & Times: Hi, folks. And welcome back. As mentioned in the intro to this section, we're gonna be looking at Pandas Time Siri's. So let's jump right into our code editor and see how we can create time. Siri's indexes in their data files. Here we are. And, as always, you can see how important time is a ppd I've imported. Our sales underscored data dot CSP flop, and that was always you can find the original data sets in. Resource is daytime. For this example, I have slightly edited the original. So even with sample data in the time Siri's and you can see the sales data dot CS vida, if you would like to examine it for it, as I always do, let's print out some information to make sure that a PD that read underscore CSP has worked correctly. So print sales data dot head Let's just get the forest five roads And there we go nice and simple and actually a very clean data set. We're starting off with her index over here, 0 to 4, and then we have our invoice date. We can see that our invoice date column Contains date values. India So 8601 former weaken Quickly improve this data set to make it easier to work with. To start, we know that we have a date column off eso format so we can include Opana pd dot reid underscore CIA's v argument Comma So at this at the end of life, prostate equals true. Next, we can use the day column as the index by including index column, quotes, invoice date So this has to be the exact same as it appears in your data frame. So we've discussed this in previous Electricidade. It is their invoice date capitalization. Everything has to be the exact same. Let's say that once again, let's print out or had information on, as you can see here. Now our invoice date has become our index. Let's keep examined this data file by running Thean fomented. So print sales data, the info gonna hide the had information that had print statement, save it and run it. So what have you got? What we can see here that we are indeed dealing with a daytime index. It has 100 99 entries. The first entry starts on the first day of December 2010 8 26 such a first sale on our last entry. So our last sale was a 10 a tree on the first of December 2010. Excellent. We can now refine and look for the interrogator by using the duct Lok access to select data by row and by call because we said our daytime columnist the index, we can get very specific with our selections. Let's take a look now at the sales data for any purchases made on 8:35 a.m. And how do we do that? We create a new variable morning sale equaled sales data. So accessing our data frame don't look. So we're using the dot Look access. Sir, we have to enter in the date and time information. So we're saying 2010-12 dash 01 What time did we say what we said 8 35 So imagine that this was a customer query or a database search in an application that you were building to retrieve sales information Attorney 500 Okay, let's print this morning sale. Save on ruin. And there it is. Invoice number, Stock code description. So bad. Building block word. Quantities of mystery sold at that time. The price on the customer i d. The location of the customer purchase accent. Very good, so we can get eaten more specific with this query. We go back over here and ended our morning sale. Variable comment. Let's look at the description. Save it and run it again. There we go. As you can see, we've pulled from description bath building Block word so we can get very, very specific with very short amount of code in a very quick time. Let's take another time. So let's go with 8 26 I think it was more sales down at 8 26 So let's have a look at that time period. 26 save run. There we go. More sales, more sales. During this time, some or information is presented again. We're familiar with the information. The daytime invoice number, stock code all the way down to the country of Purchase. We conforto expand their filter by removing the time. Let's see what happens there. So we pretty much get our entire data set returned to process up. We've just gone through is called partial string selection. You can use this process to select months and years. So let's remove the day here and select everything for 2012. There we go, 199 rolls because everything in our data set is in the month, December 2010. But you can imagine if you were selecting a time period over 1/4 or half a year or a full year of sales, you would go 2010-1 to 2010 12. That's removed to 12 on Let's select the full year. It's just going to give us the same right information return. But you get the idea of how you can use partial string selection the hone in on a month or a year, or, indeed, a range of month or year. Finding this lecture we can use slicing to select a daytime range again. Let's every morning sale here. So 2010-12. That's year one. So the first aid off December 2010. We want to look at 0826 because we noticed a few sales in their colon open quotes. So again, we're going 2010. Dash 12 Dachau, one at a time series. Let's go to 9 a.m. and see what kind of sales we did between 8 26 on 9 a.m. in the morning . Save on Run it again. So a bigger data set which contains old sales data made between 8 26 here all the way. True So 8 34 8 45 Old way True to 9 a.m. So we can use slicing to select the time. A day, a month, a year on insulation we provide a start and stop string separated, separated by a coda. Here's a start string so we could have removed at the time because removed a day we could remove the month. And here's your end string. So as you can see very powerful search criteria birthing criteria when using pandas time. Siri's OK, that's it for this act lecture. Thanks for listening. I'll see you in the next 26. Creating Date Time Lists: Hi, folks. Welcome back. Now we've seen how to set a daytime index from a column containing relevant daytime info on once. Don't. We've seen just how powerful this index can be when selecting data. So with this in mind, it's a good idea to know how to create a daytime index from scratch, so from nothing or, in our case, from a list. So let's jump right into our code editor. As you can see, I've imported Panties PD, and now I've created daytime string list so you can see it here. So part of video. Go to the kid Hope File and copy in this list or we've downloaded the files. Just copy it from there because that's what we're gonna need to create a daytime index for this lecture. The next thing we need to do is tell Panis the former over daytime on. We do that with the following, so that's created string time format equals. So we're in Europe, so I'm gonna say day, month, year, But in America might be slightly different. Might be month, day year. Think it's never different, so we're gonna say day so D for day hyphen and month percentage again year on at this time , a couple of white before without with percentage H capital hatred, our colon percentage and for a minute, closer quotes. Now panels will pass our daytime string into the proper daytime elements on build the daytime objects on. To do that, we used the following line of code, so we create a new, very book. So that's a my day. Times equals P. D. So pandas duck to daytime's we're telling it to convert to daytime on what are we telling you to convert to a daytime? Well, we're taking your string that we created at the very beginning. So our daytime string on want to be telling It to Do with that string were telling it to put it into the formats. How do you spell format within a. Nobody needs off the format time format that we just created a moment ago, and then finally, let's print out my day times and see what we get. We got an error. Why do we get an error here for two very simple reasons, the first in my enthusiasm. I close off my quotes here, and that's a danger sometimes when using article pleated atom. And second, we got an error because that's a big cattle. And that saved that. Run it again. And there we go. A daytime index created from the list that we entered in the very beginning of the lecture . That's it for this lecture. Thanks a 1,000,000 for listening, and I'll see you in the next. 27. Resampling Techniques: hi, folks. And welcome back. Now, continuing our analysis off sales data, we can look at a method known as recently using re sampling. We can apply statistical methods over different time periods. For example, day, week, month to statistical methods available include the main method. Some method on the count mattered on many more. But these are some that we'll be looking at in the common lecture. For example to come. I've used the original sales data set to correct a file called Re Sampling that CSB Let's jump in now and take a quick look. So here we are in a resetting DePuy file and, as always, on the left hand side re sampling sales data. The original is available to so quantity Invoice date, unit price on as usual, I have important pandas is PD I've already created my sales data data frame on your note. Towards the end of the line, I have para states equals true on indexed columns. Envoy State within, resembling through his method known us down simply. And this means reducing our daytime rose to a slower frequency. So, for example, the days, the weeks on weeks, the months to make use of down sampling, we must re index our daytime index. Let's take a look. No one example it down. Sampling You will modify a resembling sales data file to go from daily to weekly and produce a weekly sales average. So, on average, how many units did we sell? What was a quantity of units sold per week? So let's take a look for us. Let's examine a day a friend and, as usual, we do it with print sales data. Does info and that's run this file. We are dealing with daytime index of turkey tree entries when we have quantity on unit price as our column navels, the next thing we need to do is just have a look at the head. A g a. D save run invoice date is our index quantity on the unit price. So, as you can see here on the first of the 20 toy 2010 we sold six on What we want to do is using re sampling, take this daily quantity on re sample it or, as we said down, sample it into a weekly number. So what was the quantity sold per week? And to do that, we create a new variable weekly mean equals sales data. Such a data frame that free sample. And as we mentioned, we just said weekly on the way to specify weekly years quote couple W That means so we want the weekly average off the quantity that's print weekly. Mean save on run. Okay, here we go. This might look a little bit confusing at first, but we'll get through it and at the moment, So we have our invoice date. And then, as we can see, our index that switched to a weekly format 24 to 31 7 14 21 and then within the quantity, we have the average and this is the average per week off the quantity sold. If we wanted to view the monthly average weaken, simply change the w toe on em. Saved this. Run it again. There we go. Monthly to 31st January, last day of the month 28 February 31st of March on so on. So did the average quantity of product. So Perma So why now? I hope you've picked up that w was weekly and M is monthly. You can probably guess what string for daily is, you might also have noticed that I re sample has changed with the main method. Here is the main method here impounded. It is seen as the best practice is to follow the re sample method, but some form of statistical method. So our output is a wiki sales average, and you might have noticed that I am not in America, such as product description is ignored. So here, where we have quantity and unit price if we were dealing with a data set that had a description or the country of origin or country of sale, whatever that might be is not. If it's not numerical, it's left out of our returned put as you can also see missing weeks. So weeks with no sales are filled with none, and I am not a number. We can verify that output is correct, but checking our wiki sales. So here's our weekly back up here so we can check our wiki quantity with the weekly mean of a single location. So let's take this location here. The 24th of a 2nd 2010 on How do we do that when we do it? Using the duct lok access ER that we have previously used. That's changes here. We've already defined our weekly me. So you need to use that again. We save print weekly. Mean don't look. So we're using our weekly mean access one specific location off that data on what we want. What we said. 2010-0 to Dash 21. So the 21st of February 2010 Let's run down and see what we get. And of course, we got error because let's have a look. Let's have a look and we get this error for one very simple reason. We're looking for the 24th of a second, which is a weekly value, but our data is still set to monthly, so that's changed that to a W. Rerun it. And there we go. Quantity on the unit price the location. So as we can see the quantity treat 0.571 could be verified by just scrolling back up. 24 2nd 3.57 What now? As I mentioned, we could have columns in her output in a data file that we're not interested in, so something that we do not want to run. A statistical analysis on, and we can easily select the columns that we want to focus on with a simple modification to our prince statement here. So what? In square brackets, comma quantity. So to call him that we want to focus up, Saved up. Let's return it. So the quantity for 24th of 2nd 2010 was treat 20100.571 Very good. Excellent. So we get a nice, clean, small airport that's modifier code to use that song instead of me and switch from weekly to monthly view. So Switzer Monthly change the dark mean to some. So the actual sales, the actual quantity sold on a week, the total. Now we just need to change our print statement print weekly. Me they were saying mean, but now it's a some, but I'm not gonna change the very over name. Let you do that fair. We got very good. Nice, clean, open. So the total for two months off February 46. So obviously match was a really good month. And then coming into the summer, we've had some very, very slow months. So looking at this data, we know that match is a busy time of year. Maybe Tisa Rex. OK, that's it for this this lecture time for listening. I'll see you in the next 28. Method Chaining: hi, folks. And welcome back now, the last lecture we looked at, how we could change matter together. So this lecture is all about Method Cheney and just having a little bit deeper into that. So let's jump right into a code under. So here I've got our sales data data frame to find. But what? But what I want to do is just change the CIA's reader were using for this lecture. So in sample data, there's another one here called sales data. So let's just use that one. That's why we need to do is remove resembling underscore. There we go. Easy peasy. No, let's look at another example of method changing. And this time we're gonna look at the weekly Max over data. So let's go weekly. Underscore, Max. He quotes sales data. So our data frame 0.3 sample So nothing new so far. Recent book Weekly That song. So some of our weekly and as I said, we want the DUP max, we want the maximum amount. Okay, let's print that this print weekly. Max, that's print this out and see what we get. There we go. The weakening maximum quantity treat has 875 the weekly unit price maximum on. Obviously, it's maximized costume i d. But that's irrelevant, so we could leave that out using the methods we learned in the last lecture. As you mentioned at the beginning of this section, we can use other methods with re sampling such as that count, so I encourage you to go now. Have enough dot count on what you get from that. Is the country sold per week, the most commonly used during arguments to use for re samplings of such as W. On monthly. So such as W and and like we saw in a previous lecture, are saved in the resource of section. So over here, resource is coming re something strength saved is an Excel file to take a look. When you get a moment for this example, we conforto and hands are re sampling but adding images to our string arguments. So consider this. This is one week. Let's just take this out for a moment. Let's just say we cleat dust. I mean, let's just do that. But if we wanted, we could say the two week average simply by adding end the indigenous to pretty handy pretty easy on the scene in previous lectures. Weaken forgery factor a coat output on Lee. The quantity So we would take this a step foreigner. Modify our weekly max with the following sale. Stated that look, square brackets, cola, comma quantity So we're only interested in the quantity column. Close quote Square brackets don't re sample. It would help by spot that right every sample w So we want the weekly Got something. Khost Rockets. Let's see what we get. There we go. Invoice date on a weekly basis of quantity Tree 875 And as you can see down the bottom, here are frequency has shifted to weekly starting on Sunday were in the quantity column on its data type. Perfect. That's it for this quick lecture. Thanks for listening and see you in the next 29. How to Separate & Resampe Data: Hi, folks. And welcome back I mentioned in the previous lecture, re factoring when selecting only particular column would recently we can use Paschall string indexing to extract a particular column from a data set to Dent before re something on for this example. I'm working with sales. Underscore data that CSP Let's jump in now. So here we are have important pandas PD. I've also defined my data frame sales data, and I'm using, as you can see, sales data sales underscored dated at CSP Pirates Underscore Date is true on indexed columns Invoice date From what we're interested in, this lecture is a confident up. That's what we want to say. We want to say again. So you're familiar with this morning sales? But there is a little twist at the end. Morning sales equals sales data. So again, quantity column. Don't forget your enclosed unquote on. We're interested in a particular timeframe. So what do we want? What we want? 2010. That's 12-1 So we're familiar with this now let's down sample to obtain the highest quantity of items sold during this time frame. But let's add a slight twist using the previous resending import table. Let's change our timeframe toe early and chain on dark Max met. So when I say the previous table what I mean but again, in resource is here, you can see all the string arguments used. When? Recently. So take a look at that if you didn't do so in the last lecture. So, like, we just said we want the max. So we're gonna call. This is High quantity equals morning sales. So the morning sounds that we just defined top three sample. So what we're doing here is we already defined a variable mooring sit morning sales, and in that variable is the quantity column for this time period. And now what we're doing is we're re sampling that variable. And like I said, little twist timeframe we're gonna re sample to is hourly dot max on what we get from this is the maximum amount of sold per hour. But in our time frame, that saved that. That's print high quantity. There we go. Save. Let's run it. Okay, here we go. Pretty nifty. So the maximum amount sold early at eight o'clock. 48 early at 9 432 on a 10 o'clock eight. And as you can see here, down the bottom frequency is hourly. Our name is country On our data type, we can again down sample to see the lowest amount sold during this time period so we could say men save. So there's a minimum amount sold during this time period. And as you can see here and minus one, that's an out liar. So our data requires further examination to find out why we have a minus one. Because obviously that should be zero. Maybe to return needs to have further examination. OK, that's it for this lecture tank for listening, and I see you in the next. 30. Further Filtering Techniques: Hi folks. Welcome back. We've seen some powerful ways. The filter of data sets using training. Let's now look at one more again. I loaded up our sales. Underscore Datafile on Let's check it now says we can see para states equals true index Underscore column Invoice date. So a given it in, involved in the date range index. Okay, let's print ahead. Oh, this should be very familiar to you by now. Print head save. Let's run this. It's what we're dealing with we're doing with that actor Number six in Section 46 10 up under. There we go. So let's have a look at the descriptions for a moment. White hanging hard tea light holder, white metal Landrin cream, Cupid hearts code higher. So our descriptions are the items in. Her descriptions are very varied on pretty unique. So let's say we wanted to search only a description column over data set on. We want to the search for a particular word or strength, something like poppy or white or heart. Something very unique. Weaken search by changing the dot string dot Contains method. Tore Column Name like this. It's hide of print statement. Let's say search equals sales on the score data. So we're searching our data frame. We want to search the description column. So all of this really does make sense. When you change together, we're going to use duct string that contains on What's the string we're looking for? Well, I've already looked ahead, and what we're looking for is puppy. There we go. Saved us. That's Prentice and see what we get. Print search fingers crossed. Run it again. So we get an entire list. Where Poppy is not mentioned is false. And where in the description there is puppy, there is a string puppy. We get true, but this list isn't very useful to us, so let's see. We can clean it up a little bit more again. Let's just take out a Prince Sarah statement on how we gonna fix this. We've seen this kind of output in a previous lecture at the very beginning off the class. So what we have is false, false, false, false faults. Then we have a couple of truths. We're lucky for us. Panda supports Boolean addition on what that means is false. Zero and true is one. So if you had two false is to get it, you get zero. But if you add to truth together, you gotta to This means that we concert for how many products containing this drink puppy we sold in a given time period. So let's do that right now. So total underscore Puppy sales equals search, helped by spotted right search dot re sample. So again, we're re something to search variable that we just created. We did the same in the last lecture. We want to research that we want to resemble early dot Some. So the total sales of items that contained a string puppy in the description again, let's print total puppy sales That's safe this and run it and hope they were going to get an integer returned when we do. And that is a thing of beauty. So as we can see at 8 a.m. In that hour period, we had two sales of items that contained Poppy in the description a very powerful search filled with matted right there. That's it for this electric. Thanks for listening. I'll see you in the next 31. Multiple Line Plots on a Single Graph: Hi, everybody, and welcome back to close at this section. Let's look and have to visualize in daytime info on, in this instance the stock price of Intel, which we've used in previous lectures. So let's get started right away. We're jumping into her code, and here we are. So, as you can see, I loaded up import pandas. Is PD on in this lecture? I've also used import Matt public as plt. I have to find a data frame stock price. And as you can see, I've already imported Para States equals true on in next called equal state. So I have to find the index card. Let's print that our data frame and take a quick look Sprint Stock price dot head and see what it gives us. Save and run. We have our date index. We have open, high, low and close perfect that's just removed out. Next. Let's plot Intel's closing price and see what that looks like. So we have stock price close. So remember, it has to be the same as its appears in the data frame dot plot, So there's just gonna be a simple none descript. Plus, don't forget plt dot show brackets and to run it again, pretty messy looking. So we have no title. We have no label on the left. We have no way. We do have a label on the bottom, but it's incremental in months, and this is pretty noisy looking here. So let's close that off. Let's add in the type of let's start with that something nice and simple that we don't before title equals Intel. Let's also add in the white label, so clock that wine label water while able. Or why label is the closing price. There we go. Let's take a look slightly better. We have a title and we have our Why label. Next, we can end the date range of the X axis by using duct Lok. Let's take a closer look on October 2070. Well, we ended our stock price so we say stock priced outlook. So location access, sir, we're still interested in the clothes, so leave that there as it is, but we're interested in a particular date range. So that's a 2017. 10 16. Forgot the hyphen. They're 16. So that is a week beginning 16th in October. So, Colon, let's put in our next date range. 2017. Dash 10-20. So, as you can see, we're closing out a week here, comma for a close. Let's save this and run it again. Okay, that's better. A little bit better. It's cleaner. We can descend more information from this on Now, on the X axis, the days appears at each tip. So here you can see 16 17 18 1920 on the year and want are shown in the bottom last time corner Here. Now, as we've seen previously, we can add their plotline style with the keyword argument style. Let's do that now. So plus so let's add in Style equals K. Got Haifa Well, that give us you think Well, I'm not gonna tell you is that I'm gonna run it and have a look. So as you can see, the line is not black, and we have points at each state. Another thing we've seen previously is how we can add in the kind of plot that we want to show. So back here in our plot, weaken, say kind on one plant that we did not look out through these lectures was the area plus, So let's have a look and see what that would look like. Now simply area closer. Quotes, comma. Run it again. Uh, horrible. So as you can see here, all the information is lost in the corner off the area. So I'm gonna take that it just because you can doesn't mean you should saved up running again to make sure our changes kick in. Yep. And there we go back to normal so we can also plot two columns at same time on how might we do that? But let's have a look here. So we have stock price outlook for the week that we're concentrating on. What we can do now is rapper close in square brackets to leave in the comet, open square brackets on. As I said, we want a plot to values. So let's plot are open. That should be surrounded by quotes. And there's our clothes all surrounded by quotes. Don't forget the close up square brackets up. Just put in there. Let's run it and see what we get. Have foot. But we don't know which is the open. And we don't know which is the clothes we could ended that using the style and color attributes that we've seen in previous lectures. But let's take a different look. We can separate the plus for open and close, and we do that with the following inner Plus we say here so plots equals true Saying this. Run it again. On there we go. Okay, now I still don't like the black, so I'm just gonna take the black out. Let me reload this excellent, much better for a bit of color. Let me expand this and so you can see it. So as you can see, we're looking at Intel's open and closing prices. So the legend is already being inserted for us, and we saw it on the previous lecture titles here on our Why labels here. Her dates are still here in the part of left, but they've been slightly angled to make room and make a lot slightly better. Another thing we should note under y axis are different values, so they're not tied to each other. And that's because, obviously, the prices are different. So there's a very easy way to quickly analyze more than one piece of information or one column of data on Previously we've looked at customer orders sales quantity and things like that. So you could easily plus or now you can see how you can easily plus two piece of information under one graph. Thanks for listening, and I'll see you in the next lecture.